Home My Page Projects PaStiX
Summary Activity Forums Lists Docs News Files

Forum: help

Monitor Forum | Start New Thread Start New Thread
RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-28 22:29
[forum:111078]
Hi XL, thanks a lot! Have a nice weekend!

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-28 09:54
[forum:111000]
Hello,

You could cite this old publication :
http://www.labri.fr/perso/ramet/bib/Year/2002.complete.html#A:LaBRI::HRR01a

Or for on-goind works we have this, but no publication yet :
http://www.labri.fr/perso/ramet/bib/Year/2012.complete.html#C:LaBRI::PMAA2012

Thanks,

XL

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-27 22:34
[forum:110990]
Hi XL, just a quick question: what papers should I cite if I publish some results computed using PaStiX? I checked the publication list posted on the PaStiX home page but am not sure which one to use. Thanks a lot.

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-18 22:40
[forum:110785]
Hi XL, thanks for letting me know! My matrix has much more non-zeros because I am using a large stencil to discretize the Poisson's equation (11-by-11). So as you said I'll need a "fat" SMP node which has plenty of memory.

Thanks again for all your help and have a nice day!

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-18 15:35
[forum:110776]
Hello,

I wrote a 2D laplacian example and it required 2.7Go by node (my output is at the end of the reply).

You definitely need more memory for your problem, if PT-Scotch cannot gather the graph then the factorization won't run, for the factorisation you will need much more memory than this.
Here the graph is distributed on 4 processor, so it's 4 time bigger than the distributed graph.
The distributed matrix with fill in would be about 30 time bigger.
So here, on 4 proc the parallel symbolic factorization is not really the problem. On more procs we need the parallel factorization, when it becomes to have a size similar to the distributed matrix. So we have to implement one of the existing parallel algorithm to go to bigger problems, it's on our todo-list.

Thanks,

XL.

+--------------------------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+--------------------------------------------------------------------+
Matrix size 4198401 x 4198401
Number of nonzeros in A 37761025
+--------------------------------------------------------------------+
+ Options +
+--------------------------------------------------------------------+
Version :
SMP_SOPALIN : Defined
VERSION MPI : Defined
PASTIX_DYNSCHED : Not defined
STATS_SOPALIN : Not defined
NAPA_SOPALIN : Defined
TEST_IRECV : Not defined
TEST_ISEND : Defined
TAG : Exact Thread
FORCE_CONSO : Not defined
RECV_FANIN_OR_BLOCK : Not defined
OUT_OF_CORE : Not defined
DISTRIBUTED : Defined
METIS : Not defined
WITH_SCOTCH : Defined
INTEGER TYPE : int32_t
PASTIX_FLOAT TYPE : double
+--------------------------------------------------------------------+
Time to compute ordering 12.7 s
Time to analyze 2.09 s
Number of nonzeros in factorized matrix 825656704
Fill-in 21.8653
Number of operations (LU) 8.85013e+11
Prediction Time to factorize (AMD 6180 MKL) 38.7 s
--- Sopalin : Threads are binded ---
GMRES :
- iteration 1 :
time to solve 1.89 s
total iteration time 3.66 s
error 2.0455e-14
Max memory used after factorization 2.68 Go
Memory used after factorization 2.13 Go
Static pivoting 0
Time to factorize 303 s
Time to solve 1.83 s
Refinement 1 iterations, norm=2.04547e-14
Time for refinement 3.68 s
||u* - u||/||u*|| : 7.41923784591719e-11
Max memory used after clean 2.68 Go
Memory used after clean 0 o

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-16 04:54
[forum:110764]
Hi XL, thanks a lot for your detailed explanation! Unfortunately I am solving a scalar Poisson's equation, so your trick doesn't apply. I have done some quick research on symbolic factorization, and noticed that many existing packages are using sequential algorithms for this step (primarily due to its relatively low CPU cost?). On the other hand, there does exist parallel algorithms for symbolic factorization, for example the work by Grigori et al.

http://dl.acm.org/citation.cfm?id=1328650

(you may already know this work). Anyway, it seems that in my case I'll have to find a more powerful machine, and this is what I'll do next.

Thanks again for all your help!

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-15 07:57
[forum:110754]
Hello,

This error is a memory allocation error.

We have to gather the graph because we don't have a parallel symbolic factorization in PaStiX for the moment. It's still on the todo list.

I don't know if your code has multiple degrees of freedom by node. If so, you can use IPARM_DOF_NBR to avoid that bottleneck by using a compressed graph.

eg :
1 5 X X X X
2 6 X X X X
X X 9 1 X X
X X 0 2 X X
3 7 X X 3 5
4 8 X X 4 6

can be wroten :
colptr={1,5,9,11,13,15,17}
rows = {1,2,5,6,1,2,5,6,3,4,3,4,5,6,5,6}
vals = {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6}

or with IPARM_DOF_NBR = 2 (each node is column ordered in vals)
colptr = {1, 3, 4, 5}
rows = {1, 3, 2, 3}
vals = {1, 2, 5, 6, 3, 4, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6}

This way the graph becomes small and the memory allocation error won't appear.

The 83 Million unknow problem was runned with a centralised matrix but with huge memory nodes (128Gb or more, not sure of my memory...)

So if you can, I'll advice you to use IPARM_DOF_NBR which can solve this issue and improve a lot reordering timings.

XL.

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-15 01:30
[forum:110753]
Hi XL, thanks a lot! I applied the patch and it does work. The test run

> mpirun -np 4 simple_dist -lap 10

now goes through (and so does -lap 30 and -lap 50).

I applied the dpastix solver to a 2D Poisson's equation (close to what I really want to solve) on a mesh of size N*N. For N up to 1024 the solver returns successfully with the error of the computed solution decaying at the right order. However, when I applied the solver on a 2048*2048 mesh, with 4.2 million unknowns and 255.5 million non-zero entries, the solver crashed with the error message:

ERROR: dgraphGatherAll2: out of memory (2)

(The complete output is shown below:

]>mpirun -np 32 ./axi_mpi_s4mm6_gc_64_p.e < input_mode_eu_13
... ...

Start computation with npes=32 and:
tot_nr=2048
tot_nz=2048
... ...

+--------------------------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+--------------------------------------------------------------------+
Matrix size 4202499 x 4202499
Number of nonzeros in A 255543254
+--------------------------------------------------------------------+
+ Options +
+--------------------------------------------------------------------+
Version : exported
SMP_SOPALIN : Defined
VERSION MPI : Defined
PASTIX_DYNSCHED : Not defined
STATS_SOPALIN : Not defined
NAPA_SOPALIN : Defined
TEST_IRECV : Not defined
TEST_ISEND : Defined
TAG : Exact Thread
FORCE_CONSO : Not defined
RECV_FANIN_OR_BLOCK : Not defined
OUT_OF_CORE : Not defined
DISTRIBUTED : Defined
METIS : Defined
WITH_SCOTCH : Defined
INTEGER TYPE : int64_t
FLOAT TYPE : double
+--------------------------------------------------------------------+
Check : Numbering OK
Check : Sort CSC OK
Check : Duplicates OK
Ordering :

WARNING: PaStiX works only with PT-Scotch default strategy

Time to compute ordering 31.2 s
(2): ERROR: dgraphGatherAll2: out of memory (2)
(18): ERROR: dgraphGatherAll2: out of memory (2)
(30): ERROR: dgraphGatherAll2: out of memory (2)
(10): ERROR: dgraphGatherAll2: out of memory (2)
--------------------------------------------------------------------------


I realized that this is a PT-Scotch error and the problem seems to be caused by the gathering operation that attempts to build a centralized graph from a distributed graph. Since the complete graph is huge and the memory available on a single SMP node is limited (16 GB), the gathering operation could not be completed.

I am just wondering whether this means that dpastix cannot be used to solve problems of this size, or I should modify certain parameters of the PaStiX / PT-Scotch package? I noticed that PaStiX has been successfully applied to a 3D problem with more than 83 million unknowns (announced on your main page), so my problem, with "merely" 4.2 million unknowns, should by no means be considered as "large". It would really be very helpful if you can give me any hints.

Thanks again for all your help!

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-14 12:48
[forum:110747]
Hello,

This patch should remove the problem :

diff --git sopalin/src/pastix.c sopalin/src/pastix.c
index c126f8f..cba5ef0 100644
--- sopalin/src/pastix.c
+++ sopalin/src/pastix.c
@@ -3012,8 +3012,7 @@ int pastix_fake_fillin_csc( pastix_data_t *pastix_data,
RETURN_ERROR(retval_recv);

malcsc = API_YES;
- if (l_b != NULL)
- mal_l_b = API_YES;
+ mal_l_b = API_YES;
}
else
# endif /* DISTRIBUTED */
@@ -3170,8 +3169,7 @@ int pastix_fillin_csc( pastix_data_t *pastix_data,
RETURN_ERROR(retval_recv);

malcsc = API_YES;
- if (l_b != NULL)
- mal_l_b = API_YES;
+ mal_l_b = API_YES;
}
else
# endif /* DISTRIBUTED */

It was a reduction which was called only by a part of the processes.

Then when the other processes reach an other reduction, the error raised.

Thanks again,

XL

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-14 09:42
[forum:110745]
Hi XL, I got another question for you. When I manually distribute a matrix A and a right-hand side vector b among N processors, is it possible to assign 0 entries to a processor? For example, if I want to distribute a 10-by-10 matrix to 4 processors, can I assign the columns in such a way that the 4 processors have 4, 4, 2, and 0 columns, respectively?

As an attempt to answer this question, I've run the test program simple_dist (with MPI_Init_thread() replaced by MPI_Init()) using the driver -lap 10. The result is as follows:

> mpirun -np 4 ./simple_dist -lap 10
MPI_Init_thread level = MPI_THREAD_SINGLE
driver Laplacian
Check : Numbering OK
Check : Sort CSC OK
Check : Duplicates OK
+--------------------------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+--------------------------------------------------------------------+
Matrix size 10 x 10
Number of nonzeros in A 19
+--------------------------------------------------------------------+
+ Options +
+--------------------------------------------------------------------+
Version : exported
SMP_SOPALIN : Defined
VERSION MPI : Defined
PASTIX_DYNSCHED : Not defined
STATS_SOPALIN : Not defined
NAPA_SOPALIN : Defined
TEST_IRECV : Not defined
TEST_ISEND : Defined
TAG : Exact Thread
FORCE_CONSO : Not defined
RECV_FANIN_OR_BLOCK : Not defined
OUT_OF_CORE : Not defined
DISTRIBUTED : Defined
METIS : Defined
WITH_SCOTCH : Defined
INTEGER TYPE : int64_t
FLOAT TYPE : double
+--------------------------------------------------------------------+
Time to compute ordering 0.0052 s
Time to analyze 0.000964 s
Number of nonzeros in factorized matrice 16
Fill-in 0.842105
Number of operations (LLt) 89
Prediction Time to factorize (AMD 6180 MKL) 1.97e-05 s
--- Sopalin : Threads are binded ---
GMRES :
- iteration 1 :
time to solve 0.000812 s
total iteration time 0.00113 s
error 3.3307e-16
Static pivoting 0
Inertia 10
[shc-b:6465] *** An error occurred in MPI_Allreduce
[shc-b:6465] *** on communicator MPI_COMM_WORLD
[shc-b:6465] *** MPI_ERR_TRUNCATE: message truncated
[shc-b:6465] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
Time to factorize 0.00747 s
Time to solve 0.000484 s
Refinement 1 iterations, norm=3.33067e-16
Time for refinement 0.00129 s
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 6465 on
node shc-b exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[shc-b:06459] 3 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[shc-b:06459] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[shc-b:06459] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal


Of course, in the above test run the matrix may not have been distributed in the way that I wanted, but that's not what I am concerned. What I am concerned is the error in the MPI call, which, based on my past experience, is likely caused by a mismatched send-recv pair. This could be a result of improperly "tagged" messages, when multiple processors are sending messages to the same destination. You may want to check this part in the next release of PaStiX.

The same error was observed when the test run was repeated with -np 4 -lap 30 and -np 4 -lap 50. When the size of the problem keeps increasing it seems that the error disappears.

I'll test the 0-sized distribution later tomorrow.

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-13 10:16
[forum:110736]
Thanks for the Metis 5 patch, we will add it to our sources for the next release.

We'll also try using parmetis instead of PT-Scotch.

Thanks again,

Have a nice day,

XL.

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-13 09:28
[forum:110735]

pastix_with_metis_5.c (85) downloads
Hi XL, thanks for your quick reply! I did have checked OpenMPI 1.6.4 but it seems to me that they haven't done anything seriously to improve the thread functionality of the library. It looks like multi-threading support is not their top priorities.

As for the distribution of the matrix / right-hand side, it is a good idea to allow the users to specify their own distribution and have the solver stick with this distribution, as long as they know what they do. Or the user may just use the solver as a black box without worrying about the distribution at all.

By the way, the current version of PaStiX calls Metis 4.x API which supports only 32-bit integer arithmetic. The more recent Metix 5.x package supports both 32- and 64-bit integer arithmetic (just like Scotch 5.x, 6.x), and it may be desirable to include them in future releases of PaStiX. I have already modified pastix.c to use Metis 5.x instead of Metis 4.x, and attached you may find the changes that I've made. It's also easy to use Parmetis 4.x (which is based on Metis 5.x) to compute parallel orderings, but I didn't make the changes because I am not sure which internal data structure should be modified. If you are interested you may incorporate these changes in the future.

Thanks again for your time!

Best Regards,
Guo Luo

-----------------------------------

changes of pastix.c to use Metis 5.x (complete source in attachment)

1593,1594c1593
< INT baseval;
< INT opt[8];
---
> INT opt[METIS_NOPTIONS];
1596,1598c1595
< baseval = 1;
<
< if (sizeof(INT) != sizeof(int))
---
> if (sizeof(INT) != sizeof(idx_t))
1609c1606,1607
< opt[OPTION_PTYPE ] = (iparm[IPARM_DEFAULT_ORDERING]==API_YES)?0:1;
---
> METIS_SetDefaultOptions(opt);
> opt[METIS_OPTION_NUMBERING] = 1;
1612c1610,1611
< opt[OPTION_PTYPE ] = 0;
---
> /* OPTION_PTYPE is not set in Metis 4.x *
> opt[METIS_OPTION_PTYPE ] = METIS_PTYPE_KWAY; */
1614,1620c1613,1622
< opt[OPTION_CTYPE ] = iparm[IPARM_ORDERING_SWITCH_LEVEL];
< opt[OPTION_ITYPE ] = iparm[IPARM_ORDERING_CMIN];
< opt[OPTION_RTYPE ] = iparm[IPARM_ORDERING_CMAX];
< opt[OPTION_DBGLVL ] = iparm[IPARM_ORDERING_FRAT];
< opt[OPTION_OFLAGS ] = iparm[IPARM_STATIC_PIVOTING];
< opt[OPTION_PFACTOR] = iparm[IPARM_METIS_PFACTOR];
< opt[OPTION_NSEPS ] = iparm[IPARM_NNZEROS];
---
> if (iparm[IPARM_DEFAULT_ORDERING] != API_YES){
> opt[METIS_OPTION_CTYPE ] = iparm[IPARM_ORDERING_SWITCH_LEVEL];
> opt[METIS_OPTION_IPTYPE ] = iparm[IPARM_ORDERING_CMIN];
> opt[METIS_OPTION_RTYPE ] = iparm[IPARM_ORDERING_CMAX];
> opt[METIS_OPTION_DBGLVL ] = iparm[IPARM_ORDERING_FRAT];
> opt[METIS_OPTION_COMPRESS] = iparm[IPARM_STATIC_PIVOTING] & 1;
> opt[METIS_OPTION_CCORDER ] = iparm[IPARM_STATIC_PIVOTING] & 2;
> opt[METIS_OPTION_PFACTOR ] = iparm[IPARM_METIS_PFACTOR];
> opt[METIS_OPTION_NSEPS ] = iparm[IPARM_NNZEROS];
> }
1624c1626
< METIS_NodeND(&n, *col2, *row2, &baseval,
---
> METIS_NodeND(&n, *col2, *row2, NULL,

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-13 08:03
[forum:110733]
Hello,

Maybe with a more up to date version of OpenMPI (last is 1.6.4), else there may be a bug either in PT-Scotch or OpenMPI.

For the cscd_redispatch, it is used in the examples but it's not the smarter thing the user can do (not in your case but when users can choose the distribution as they want) , I have to add explanation for this also in the distributed examples... This is nearly what is done internaly inside PaStiX. In a general case, users can do smarter thing to reduce communications.

Regards,

XL.

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-13 06:43
[forum:110732]
Hi XL, thanks a lot for your quick reply! Yes I did have built my OpenMPI to support MPI_THREAD_MULTIPLE and this is how I ran the test programs in src/examples. It's just that OpenMPI is not thread safe.

Thanks also for clarifying my questions! As for (3), I don't want to use the distribution given by the solver because I need to use a special distribution (given by FFTW3) to perform global transposition. Otherwise I don't mind using the solver's distribution and it's not terribly difficult to call cscd_redispatch().

And yes, it would be very helpful if you can add comments to the step-by-step example to explain the "best practice" of using PaStiX.

Thanks again for all your time!

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-13 05:20
[forum:110730]
Hello,

For the MPI version you can use various MPI libraries (mpich2, openmpi, mvapich2, intel MPI...) as soon as they are supporting MPI_THREAD_MULTIPLE (you need to build it with the correct compilation options) (Needed with SCOTCH_PTHREAD and depending on IPARM_THREAD_NBR and IPARM_THREAD_COMM_MODE in PaStiX).

Yes you can use this strategy, it seems good to me.
(1) true,
(2) true again,
(3) If you don't care about PaStiX internal distribution, all redistribution will be internal, you don't need to take care of it (And i'm currently working on improving murge interface to stick to the solver distribution more easily).

I'll add comments to the step-by-step example for documenting question (1) and (2) and write something for the (3) in documentation => TODOLIST

Thanks again for your feedback,

XL.

RE: simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-13 00:58
[forum:110729]
Hi XL thanks a lot for your quick reply. Yes I've rebuilt ptscotch without the -DSCOTCH_PTHREAD flag and have run the test program simple_dist with MPI_Init_thread() replaced by MPI_Init(). This time it does work and produce the correct results. There is no need to remove -DCOMMON_PTHREAD because it doesn't require multiple thread support, which is what is causing the problem.

By the way, I am using OpenMPI 1.4.5, and didn't realize it's not absolutely thread safe until reading its documentation. I guess you must be using a different MPI implementation (MPICH?) when testing PaStiX.

I have another quick question if you don't mind. I am using PaStiX to solve a Poisson's equation over and over again with different right-hand side (this is part of a time-dependent PDE). The Poisson's equation is discretized on an adaptive mesh, which gives a linear system whose coefficient matrix depends on the underlying mesh. While the matrix changes every time the mesh is changed, the non-zero pattern remains the same.

So my question is, can I solve the problem using the following strategy?

/* initial set-up and analysis */
set parameters of the dpastix solver;
initialize the dpastix solver with API_TASK_INIT;
fill in the matrix A and right-hand side b at t = 0 (using user-determined distribution);
call dpastix with the given A, b, and
iparm[IPARM_START_TASK] = API_TASK_ORDERING;
iparm[IPARM_END_TASK] = API_TASK_ANALYSE;

/* advance the solution */
while t < T
/* numerical factorization */
call dpastix with the given A, b, and
iparm[IPARM_START_TASK] = API_TASK_NUMFACT;
iparm[IPARM_END_TASK] = API_TASK_NUMFACT;

while mesh is not changed
/* numerical solve */
call dpastix with the given A, b, and
iparm[IPARM_START_TASK] = API_TASK_SOLVE;
iparm[IPARM_END_TASK] = API_TASK_REFINE;
update the solution from t to t+h;
t = t+h;

compute the new right-hand side b;
determine if mesh needs to be adapted;
end while

/* if mesh adaptation is necessary */
adapt the mesh;
compute the new matrix A and new right-hand side b;
end while

/* cleanup */
call dpastix with
iparm[IPARM_START_TASK] = API_TASK_CLEAN;
iparm[IPARM_END_TASK] = API_TASK_CLEAN;


In particular, I am wondering if it's true that
(1). the ordering, symbolic factorization, and analysis steps need to be done *only once*; no numerical values of A or b is used in these steps;
(2). the numerical factorization need to be done only once for each matrix A, and the results are stored internally in pastix_data;
(3). *no redispatch* of the matrix A or the right-hand side b is needed before the numerical solve step if I want to preserve the user-determined distribution.

I believe that the answers to the above questions are all "yes", after browsing through the source pastix.c. Since they are not explicitly documented in user's guide or in the given sample programs, I'd be very grateful if you can kindly confirm my conjectures.

Thanks again for your time.

Best Regards,
Guo Luo

RE: simple_dist segmentation fault [ Reply ]
By: Xavier Lacoste on 2013-03-12 08:18
[forum:110720]
Hello and thanks for your feedback.

If you want to check if it comes from a problem with PT-Scotch threads you can remove all the THREADS flags from it's Makefile.inc (-DCOMMON_PTHREAD, -DSCOTCH_PTHREAD)

If the problem is stille there you can had a SCOTCHdgraphSave call before the SCOTCH_dgraphOrderCompute() call in sopalin/src/pastix.c like :
{
char name[256];
FILE * stream;
sprintf(name, "dgraph_%d\n", procnum);
stream = fopen(name, "w");
SCOTCHdgraphSave(dgraph, stream)
fclose(stream);
}

And try running it directly in Scotch using dgord :
> mpirun -np nbproc dgord dgraph_%r

And tell us if it runs.

Tell me if I'm not clear,

XL.

simple_dist segmentation fault [ Reply ]
By: Guo Luo on 2013-03-12 00:34
[forum:110719]

diff_LINUX-INTEL_config.txt (23) downloads
Dear Pastix developers,

I recently installed pastix 5.2.1 on my linux x86_64 cluster using Intel C/C++ compiler 11.1. The configuration is based on src/config/LINUX-INTEL.in, with the modifications listed in the attached file diff_LINUX-INTEL_config.txt. The ordering package is ptscotch_5.1.11_esmumps built with -DSCOTCH_PTHREAD -DIDXSIZE64 -DINTSIZE64.

The installation (make all; make examples) seems to be successful, and running the test program

> mpirun -np 2 src/example/bin/simple_dist -lap 100

gives correct results. However, when I tried running the program on more processors (16, 32, or 64 processors) I began to get segmentation fault, for example:

>mpirun -np 32 ./simple_dist -lap 1000
MPI_Init_thread level = MPI_THREAD_MULTIPLE
driver Laplacian
Check : Numbering OK
Check : Sort CSC OK
Check : Duplicates OK
+--------------------------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+--------------------------------------------------------------------+
Matrix size 1000 x 1000
Number of nonzeros in A 1999
+--------------------------------------------------------------------+
+ Options +
+--------------------------------------------------------------------+
Version : exported
SMP_SOPALIN : Defined
VERSION MPI : Defined
PASTIX_DYNSCHED : Not defined
STATS_SOPALIN : Not defined
NAPA_SOPALIN : Defined
TEST_IRECV : Not defined
TEST_ISEND : Defined
TAG : Exact Thread
FORCE_CONSO : Not defined
RECV_FANIN_OR_BLOCK : Not defined
OUT_OF_CORE : Not defined
DISTRIBUTED : Defined
METIS : Defined
WITH_SCOTCH : Defined
INTEGER TYPE : int64_t
FLOAT TYPE : double
+--------------------------------------------------------------------+
[shc180:30392] *** Process received signal ***
[shc180:30392] Signal: Segmentation fault (11)
[shc180:30392] Signal code: Address not mapped (1)
[shc180:30392] Failing at address: 0xa5ef000
[shc190:02121] *** Process received signal ***
[shc190:02121] Signal: Segmentation fault (11)
[shc190:02121] Signal code: Address not mapped (1)
[shc190:02121] Failing at address: 0xf6e000
[shc180:30392] [ 0] /lib64/libpthread.so.0 [0x2afc7595fb10]
[shc180:30392] [ 1] ./simple_dist(_SCOTCHhdgraphFold2+0x1ac0) [0x6495a0]
[shc180:30392] [ 2] ./simple_dist [0x62e1b7]
[shc180:30392] [ 3] ./simple_dist(_SCOTCHhdgraphOrderNd+0x32b) [0x62db0b]
[shc180:30392] [ 4] ./simple_dist(_SCOTCHhdgraphOrderNd+0x40d) [0x62dbed]
[shc180:30392] [ 5] ./simple_dist(_SCOTCHhdgraphOrderSt+0x8c) [0x62797c]
[shc180:30392] [ 6] ./simple_dist(SCOTCH_dgraphOrderCompute+0xa7) [0x61a327]
[shc180:30392] [ 7] ./simple_dist(dpastix_task_scotch+0xb07) [0x4574e7]
[shc180:30392] [ 8] ./simple_dist(dpastix+0x5a1) [0x454551]
[shc180:30392] [ 9] ./simple_dist(main+0x6ea) [0x4219ea]
--------------------------------------------------------------------------
orterun noticed that process rank 2 with PID 2121 on node shc190 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[shc181][[52130,1],10][btl_tcp_frag.c:214:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[shc181][[52130,1],8][btl_tcp_frag.c:214:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[shc180:30392] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2afc7841b994]


Curiously, it seems that as long as I launch the program with no more than 8 processors the program with complete successfully, no matter how large the problem is (of course, no larger than the memory limit of the compute nodes). Since each node in my linux cluster has 8 cores, I am wondering if this is a problem related to the thread safety of the pt-scotch package. Do you have any suggestions on this? Any help from you will be greatly appreciated. I am writing a code to solve a Poisson's equation on a very large mesh, and really need pastix to do the linear solve for me.

Thank you very much for your time.

Best Regards,
Guo Luo