Home My Page Projects PaStiX
Summary Activity Forums Lists Docs News Files

Forum: help

Monitor Forum | Start New Thread Start New Thread
RE: MPI problems with new version [ Reply ]
By: Xavier Lacoste on 2012-06-19 14:39
[forum:109797]
Hello again,

I tried to reproduce your problem by performing the factorization before redistributing in simple_dist.c :

diff --git example/src/simple_dist.c example/src/simple_dist.c
index 6784c4c..3d24881 100644
--- example/src/simple_dist.c
+++ example/src/simple_dist.c
@@ -249,6 +249,13 @@ int main (int argc, char **argv)

pastix_getLocalNodeLst(&pastix_data, loc2glob2);

+ iparm[IPARM_START_TASK] = API_TASK_NUMFACT;
+ iparm[IPARM_END_TASK] = API_TASK_NUMFACT;
+
+ dpastix(&pastix_data, MPI_COMM_WORLD,
+ ncol, colptr, rows, values, loc2glob,
+ perm, invp, rhs2, 1, iparm, dparm);
+
if (EXIT_SUCCESS != cscd_redispatch(ncol, colptr, rows, values, rhs, loc2glob,
ncol2, &colptr2, &rows2, &values2, &rhs2, loc2glob2,
MPI_COMM_WORLD))
@@ -261,7 +268,7 @@ int main (int argc, char **argv)
free(loc2glob);
free(perm);

- iparm[IPARM_START_TASK] = API_TASK_NUMFACT;
+ iparm[IPARM_START_TASK] = API_TASK_SOLVE;
iparm[IPARM_END_TASK] = API_TASK_CLEAN;

PRINT_RHS("RHS", rhs2, ncol2, mpid, iparm[IPARM_VERBOSE]);


But this did work correctly...

Can you try it with your matrix or send the matrix to me so that I can test ?

Thanks,

XL.

EDIT: The solution using your solution all the time seems to be a good one as you need to have it in your distribution, so you can let PaStiX redistribute it internally... But I still need to fix the other methods behavior anyway...

EDIT2: I think I did reproduce it with this patch :

diff --git example/src/simple_dist.c example/src/simple_dist.c
index 6784c4c..8476534 100644
--- example/src/simple_dist.c
+++ example/src/simple_dist.c
@@ -189,7 +189,7 @@ int main (int argc, char **argv)
return EXIT_FAILURE;

iparm[IPARM_START_TASK] = API_TASK_ORDERING;
- iparm[IPARM_END_TASK] = API_TASK_BLEND;
+ iparm[IPARM_END_TASK] = API_TASK_NUMFACT;


/*******************************************/
@@ -261,7 +261,7 @@ int main (int argc, char **argv)
free(loc2glob);
free(perm);

- iparm[IPARM_START_TASK] = API_TASK_NUMFACT;
+ iparm[IPARM_START_TASK] = API_TASK_SOLVE;
iparm[IPARM_END_TASK] = API_TASK_CLEAN;

PRINT_RHS("RHS", rhs2, ncol2, mpid, iparm[IPARM_VERBOSE]);


EDIT 3: I was not reproducing your bug but creating a new one by running the numerical factorization with a NULL values pointer.... So I'm back to the beginning, still trying to reproduce your bug.... (and I need to add an error when running NUMFACT with null values pointer ;) )

RE: MPI problems with new version [ Reply ]
By: Nobody on 2012-06-19 12:32
[forum:109795]
Hi,

I believe we noticed that the matrix was given in users distribution and we "forgot" to check if the RHS is given in a different distribution. I'll check what destroy the good behavior we had before.

For the forum you also can enlarge the box by moving the bottom right corner.

XL

RE: MPI problems with new version [ Reply ]
By: Pierre Ramet on 2012-06-15 13:22
[forum:109788]
Hi Garh,
So I think something is broken in the new release, we will investigate.
About the example with a user distribution, the driver fmurge.F90 or murge.c could help but it use the "murge" API. We add, in our todo list, to set a new "simple" driver to check that functionality.
About the interface of the forum, I have the same trouble, but before to submit a bug report, I discover that you can enable a popup to edit the message when you click on the icon closed to the "message" header.
Regards,
Pierre.

RE: MPI problems with new version [ Reply ]
By: Garth Wells on 2012-06-15 11:02
[forum:109787]
Hi Pierre,

I had a lot of trouble ugrading (seg faults). It seems to work now,
but I have changed the re-numbering. With the old version I did:

1. Build distributed matrix using my own distribution
2. Factorise matrix with PaStiX
3. Get PaStiX distribution for RHS
4. Supply RHS in *PaStiX* distribution
5. Solve, get x in *PaStiX* distribution
6. Redistribute x using my distribution

This seg faulted with the new version. To get the new version to work, I do:

1. Build distributed matrix using my own distribution
2. Factorise matrix
4. Supply RHS in *my* distribution
5. Solve, get x in *my* distribution

I have verified my solution for correctness. I assume that a default
option in PaStiX has been changed?

I did not set

iparm[IPARM_CSCD_CORRECT] == API_NO

The documentation on distributed matrices and vector in the users
distribution is not very clear. Maybe an example that builds a matrix
in parallel (i.e. not from file and not according to the solver
distribution), and is then solved using PaStiX would be helpful?

Is there an easier way to use this forum? When I reply to an email, as instructed it bounces back, and via the web interface I get a window that only shows one line of my message.

Regards,
Garth

RE: MPI problems with new version [ Reply ]
By: Pierre Ramet on 2012-06-15 09:06
[forum:109784]
Hi Garth, thank you for this problem report.
This print means that your distribution of the matrix does not match with the distribution inside PaStiX for factorization and solve (or you have explicitly set iparm[IPARM_CSCD_CORRECT] == API_NO). This should not be a problem since PaStiX redistributes the data by itself.
When you say "Has something changed?", I understand it was working on previous release... I will check with Xavier on next monday if something has changed on this part, and I remember we have a simple driver (do_not_redispatch_rhs.c) in the example repository to check this option.
Regards,
Pierre.

MPI problems with new version [ Reply ]
By: Garth Wells on 2012-06-13 10:49
[forum:109783]
I'm building a distributed matrix from my own distribution. The PaStiX
matrix check function is fine, and I can factorise the matrix.
However, at the solve stage PaStiX seg faults when run with MPI as it
prints

Redistributing solution into Users' distribution

If I make the array containing the RHS quite a bit larger than is
necessary, it usually runs without seg faulting. Does anyone have some
tips? Has something changed? Can I stop PaStiX distributing the
solution?

Garth