LAPACK  3.5.0
LAPACK: Linear Algebra PACKage
dgejsv.f File Reference

Go to the source code of this file.

Functions/Subroutines

subroutine dgejsv (JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, M, N, A, LDA, SVA, U, LDU, V, LDV, WORK, LWORK, IWORK, INFO)
 DGEJSV More...
 

Function/Subroutine Documentation

subroutine dgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
double precision, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( n )  SVA,
double precision, dimension( ldu, * )  U,
integer  LDU,
double precision, dimension( ldv, * )  V,
integer  LDV,
double precision, dimension( lwork )  WORK,
integer  LWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

DGEJSV

Download DGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^t,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
 [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.
Parameters
[in]JOBA
          JOBA is CHARACTER*1
        Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
             with well-conditioned B and arbitrary diagonal matrix D.
             The accuracy cannot be spoiled by COLUMN scaling. The
             accuracy of the computed output depends on the condition of
             B, and the procedure aims at the best theoretical accuracy.
             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
             bounded by f(M,N)*epsilon* cond(B), independent of D.
             The input matrix is preprocessed with the QRF with column
             pivoting. This initial preprocessing and preconditioning by
             a rank revealing QR factorization is common for all values of
             JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
             condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
             D1, D2, and well-conditioned matrix C, this option gives
             higher accuracy than the 'C' option. If the structure of the
             input matrix is not known, and relative accuracy is
             desirable, then this option is advisable. The input matrix A
             is preprocessed with QR factorization with FULL (row and
             column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
             condition number of B, where A=D*B. If A has heavily weighted
             rows, then using this condition number gives too pessimistic
             error bound.
       = 'A': Small singular values are the noise and the matrix is treated
             as numerically rank defficient. The error in the computed
             singular values is bounded by f(m,n)*epsilon*||A||.
             The computed SVD A = U * S * V^t restores A up to
             f(m,n)*epsilon*||A||.
             This gives the procedure the licence to discard (set to zero)
             all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
             QR factorization is used do reveal (using triangular factor)
             a gap sigma_{r+1} < epsilon * sigma_r in which case the
             numerical RANK is declared to be r. The SVD is computed with
             absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
        Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
             of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
        Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
             are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
             computed as the product of Jacobi rotations. This option is
             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
       = 'W': V may be used as workspace of length N*N. See the description
             of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
        Specifies the RANGE for the singular values. Issues the licence to
        set to zero small positive singular values if they are outside
        specified range. If A .NE. 0 is scaled so that the largest singular
        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
        the licence to kill columns of A whose norm in c*A is less than
        DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
             BLAS and QR factorizations and triangular solvers are
             implemented to work in that range. If the condition of A
             is greater than BIG, use DGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
             (roughly, as described above). This option is recommended.
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        For computing the singular values in the FULL range [SFMIN,BIG]
        use DGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
        If the matrix is square then the procedure may determine to use
        transposed A if A^t seems to be better with respect to convergence.
        If the matrix is not square, JOBT is ignored. This is subject to
        changes in the future.
        The decision is based on two values of entropy over the adjoint
        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
        convergence of Jacobi process if A^t is taken as input. If A is
        replaced with A^t, then the row pivoting is included automatically.
       = 'N': do not speculate.
        This option can be used to compute only the singular values, or the
        full SVD (U, SIGMA and V). For only one set of singular vectors
        (U or V), the caller should provide both U and V, as one of the
        matrices is used as workspace if the matrix A is transposed.
        The implementer can easily remove this constraint and make the
        code more complicated. See the descriptions of U and V.
[in]JOBP
          JOBP is CHARACTER*1
        Issues the licence to introduce structured perturbations to drown
        denormalized numbers. This licence should be active if the
        denormals are poorly implemented, causing slow computation,
        especially in cases of fast convergence (!). For details see [1,2].
        For the sake of simplicity, this perturbations are included only
        when the full SVD or only the singular values are requested. The
        implementer/user can easily add the perturbation for the cases of
        computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is DOUBLE PRECISION array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is DOUBLE PRECISION array, dimension ( LDU, N )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^t. In that case, [V] is computed
                         in U as left singular vectors of A^t and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is DOUBLE PRECISION array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^t. In that case, [U] is computed
                         in V as right singular vectors of A^t and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]WORK
          WORK is DOUBLE PRECISION array, dimension at least LWORK.
          On exit, if N.GT.0 .AND. M.GT.0 (else not referenced), 
          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
                    that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          WORK(2) = See the description of WORK(1).
          WORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
                    It is computed using DPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          WORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          WORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.

          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
                    probability simplex.
          WORK(7) = the entropy of A*A^t.
[in]LWORK
          LWORK is INTEGER
          Length of WORK to confirm proper allocation of work space.
          LWORK depends on the job:

          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            -> .. no scaled condition estimate required (JOBE.EQ.'N'):
               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
               block size for DGEQP3 and DGEQRF.
               In general, optimal LWORK is computed as 
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).        
            -> .. an estimate of the scaled condition number of A is
               required (JOBA='E', 'G'). In this case, LWORK is the maximum
               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
               ->> For optimal performance (blocked code) the optimal value 
               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 
                                                     N+N*N+LWORK(DPOCON),7).

          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,
               DORMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON), 
                       N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).

          If SIGMA and the left singular vectors are needed
            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)). 
               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or 
               M*NB (for JOBU.EQ.'F').
               
          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and 
            -> if JOBV.EQ.'V'  
               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N). 
            -> if JOBV.EQ.'J' the minimal requirement is 
               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
            -> For optimal performance, LWORK should be additionally
               larger than N+M*NB, where NB is the optimal block size
               for DORMQR.
[out]IWORK
          IWORK is INTEGER array, dimension M+3*N.
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successfull exit;
           > 0 :  DGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
September 2012
Further Details:
  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by DGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (DGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (DGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: DGEQP3) should be
  implemented as in [3]. We have a new version of DGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank defficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the inital QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in DGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of DGEJSV uses only the simplest, naive data movement.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 476 of file dgejsv.f.

476 *
477 * -- LAPACK computational routine (version 3.4.2) --
478 * -- LAPACK is a software package provided by Univ. of Tennessee, --
479 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
480 * September 2012
481 *
482 * .. Scalar Arguments ..
483  IMPLICIT NONE
484  INTEGER info, lda, ldu, ldv, lwork, m, n
485 * ..
486 * .. Array Arguments ..
487  DOUBLE PRECISION a( lda, * ), sva( n ), u( ldu, * ), v( ldv, * ),
488  $ work( lwork )
489  INTEGER iwork( * )
490  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
491 * ..
492 *
493 * ===========================================================================
494 *
495 * .. Local Parameters ..
496  DOUBLE PRECISION zero, one
497  parameter( zero = 0.0d0, one = 1.0d0 )
498 * ..
499 * .. Local Scalars ..
500  DOUBLE PRECISION aapp, aaqq, aatmax, aatmin, big, big1, cond_ok,
501  $ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
502  $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
503  INTEGER ierr, n1, nr, numrank, p, q, warning
504  LOGICAL almort, defr, errest, goscal, jracc, kill, lsvec,
505  $ l2aber, l2kill, l2pert, l2rank, l2tran,
506  $ noscal, rowpiv, rsvec, transp
507 * ..
508 * .. Intrinsic Functions ..
509  INTRINSIC dabs, dlog, dmax1, dmin1, dble,
510  $ max0, min0, idnint, dsign, dsqrt
511 * ..
512 * .. External Functions ..
513  DOUBLE PRECISION dlamch, dnrm2
514  INTEGER idamax
515  LOGICAL lsame
516  EXTERNAL idamax, lsame, dlamch, dnrm2
517 * ..
518 * .. External Subroutines ..
519  EXTERNAL dcopy, dgelqf, dgeqp3, dgeqrf, dlacpy, dlascl,
522 *
523  EXTERNAL dgesvj
524 * ..
525 *
526 * Test the input arguments
527 *
528  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
529  jracc = lsame( jobv, 'J' )
530  rsvec = lsame( jobv, 'V' ) .OR. jracc
531  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
532  l2rank = lsame( joba, 'R' )
533  l2aber = lsame( joba, 'A' )
534  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
535  l2tran = lsame( jobt, 'T' )
536  l2kill = lsame( jobr, 'R' )
537  defr = lsame( jobr, 'N' )
538  l2pert = lsame( jobp, 'P' )
539 *
540  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
541  $ errest .OR. lsame( joba, 'C' ) )) THEN
542  info = - 1
543  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
544  $ lsame( jobu, 'W' )) ) THEN
545  info = - 2
546  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
547  $ lsame( jobv, 'W' )) .OR. ( jracc .AND. (.NOT.lsvec) ) ) THEN
548  info = - 3
549  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) THEN
550  info = - 4
551  ELSE IF ( .NOT. ( l2tran .OR. lsame( jobt, 'N' ) ) ) THEN
552  info = - 5
553  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) THEN
554  info = - 6
555  ELSE IF ( m .LT. 0 ) THEN
556  info = - 7
557  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) THEN
558  info = - 8
559  ELSE IF ( lda .LT. m ) THEN
560  info = - 10
561  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) THEN
562  info = - 13
563  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) THEN
564  info = - 14
565  ELSE IF ( (.NOT.(lsvec .OR. rsvec .OR. errest).AND.
566  & (lwork .LT. max0(7,4*n+1,2*m+n))) .OR.
567  & (.NOT.(lsvec .OR. rsvec) .AND. errest .AND.
568  & (lwork .LT. max0(7,4*n+n*n,2*m+n))) .OR.
569  & (lsvec .AND. (.NOT.rsvec) .AND. (lwork .LT. max0(7,2*m+n,4*n+1)))
570  & .OR.
571  & (rsvec .AND. (.NOT.lsvec) .AND. (lwork .LT. max0(7,2*m+n,4*n+1)))
572  & .OR.
573  & (lsvec .AND. rsvec .AND. (.NOT.jracc) .AND.
574  & (lwork.LT.max0(2*m+n,6*n+2*n*n)))
575  & .OR. (lsvec .AND. rsvec .AND. jracc .AND.
576  & lwork.LT.max0(2*m+n,4*n+n*n,2*n+n*n+6)))
577  & THEN
578  info = - 17
579  ELSE
580 * #:)
581  info = 0
582  END IF
583 *
584  IF ( info .NE. 0 ) THEN
585 * #:(
586  CALL xerbla( 'DGEJSV', - info )
587  RETURN
588  END IF
589 *
590 * Quick return for void matrix (Y3K safe)
591 * #:)
592  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) RETURN
593 *
594 * Determine whether the matrix U should be M x N or M x M
595 *
596  IF ( lsvec ) THEN
597  n1 = n
598  IF ( lsame( jobu, 'F' ) ) n1 = m
599  END IF
600 *
601 * Set numerical parameters
602 *
603 *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
604 *
605  epsln = dlamch('Epsilon')
606  sfmin = dlamch('SafeMinimum')
607  small = sfmin / epsln
608  big = dlamch('O')
609 * BIG = ONE / SFMIN
610 *
611 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
612 *
613 *(!) If necessary, scale SVA() to protect the largest norm from
614 * overflow. It is possible that this scaling pushes the smallest
615 * column norm left from the underflow threshold (extreme case).
616 *
617  scalem = one / dsqrt(dble(m)*dble(n))
618  noscal = .true.
619  goscal = .true.
620  DO 1874 p = 1, n
621  aapp = zero
622  aaqq = one
623  CALL dlassq( m, a(1,p), 1, aapp, aaqq )
624  IF ( aapp .GT. big ) THEN
625  info = - 9
626  CALL xerbla( 'DGEJSV', -info )
627  RETURN
628  END IF
629  aaqq = dsqrt(aaqq)
630  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) THEN
631  sva(p) = aapp * aaqq
632  ELSE
633  noscal = .false.
634  sva(p) = aapp * ( aaqq * scalem )
635  IF ( goscal ) THEN
636  goscal = .false.
637  CALL dscal( p-1, scalem, sva, 1 )
638  END IF
639  END IF
640  1874 CONTINUE
641 *
642  IF ( noscal ) scalem = one
643 *
644  aapp = zero
645  aaqq = big
646  DO 4781 p = 1, n
647  aapp = dmax1( aapp, sva(p) )
648  IF ( sva(p) .NE. zero ) aaqq = dmin1( aaqq, sva(p) )
649  4781 CONTINUE
650 *
651 * Quick return for zero M x N matrix
652 * #:)
653  IF ( aapp .EQ. zero ) THEN
654  IF ( lsvec ) CALL dlaset( 'G', m, n1, zero, one, u, ldu )
655  IF ( rsvec ) CALL dlaset( 'G', n, n, zero, one, v, ldv )
656  work(1) = one
657  work(2) = one
658  IF ( errest ) work(3) = one
659  IF ( lsvec .AND. rsvec ) THEN
660  work(4) = one
661  work(5) = one
662  END IF
663  IF ( l2tran ) THEN
664  work(6) = zero
665  work(7) = zero
666  END IF
667  iwork(1) = 0
668  iwork(2) = 0
669  iwork(3) = 0
670  RETURN
671  END IF
672 *
673 * Issue warning if denormalized column norms detected. Override the
674 * high relative accuracy request. Issue licence to kill columns
675 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
676 * #:(
677  warning = 0
678  IF ( aaqq .LE. sfmin ) THEN
679  l2rank = .true.
680  l2kill = .true.
681  warning = 1
682  END IF
683 *
684 * Quick return for one-column matrix
685 * #:)
686  IF ( n .EQ. 1 ) THEN
687 *
688  IF ( lsvec ) THEN
689  CALL dlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
690  CALL dlacpy( 'A', m, 1, a, lda, u, ldu )
691 * computing all M left singular vectors of the M x 1 matrix
692  IF ( n1 .NE. n ) THEN
693  CALL dgeqrf( m, n, u,ldu, work, work(n+1),lwork-n,ierr )
694  CALL dorgqr( m,n1,1, u,ldu,work,work(n+1),lwork-n,ierr )
695  CALL dcopy( m, a(1,1), 1, u(1,1), 1 )
696  END IF
697  END IF
698  IF ( rsvec ) THEN
699  v(1,1) = one
700  END IF
701  IF ( sva(1) .LT. (big*scalem) ) THEN
702  sva(1) = sva(1) / scalem
703  scalem = one
704  END IF
705  work(1) = one / scalem
706  work(2) = one
707  IF ( sva(1) .NE. zero ) THEN
708  iwork(1) = 1
709  IF ( ( sva(1) / scalem) .GE. sfmin ) THEN
710  iwork(2) = 1
711  ELSE
712  iwork(2) = 0
713  END IF
714  ELSE
715  iwork(1) = 0
716  iwork(2) = 0
717  END IF
718  IF ( errest ) work(3) = one
719  IF ( lsvec .AND. rsvec ) THEN
720  work(4) = one
721  work(5) = one
722  END IF
723  IF ( l2tran ) THEN
724  work(6) = zero
725  work(7) = zero
726  END IF
727  RETURN
728 *
729  END IF
730 *
731  transp = .false.
732  l2tran = l2tran .AND. ( m .EQ. n )
733 *
734  aatmax = -one
735  aatmin = big
736  IF ( rowpiv .OR. l2tran ) THEN
737 *
738 * Compute the row norms, needed to determine row pivoting sequence
739 * (in the case of heavily row weighted A, row pivoting is strongly
740 * advised) and to collect information needed to compare the
741 * structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
742 *
743  IF ( l2tran ) THEN
744  DO 1950 p = 1, m
745  xsc = zero
746  temp1 = one
747  CALL dlassq( n, a(p,1), lda, xsc, temp1 )
748 * DLASSQ gets both the ell_2 and the ell_infinity norm
749 * in one pass through the vector
750  work(m+n+p) = xsc * scalem
751  work(n+p) = xsc * (scalem*dsqrt(temp1))
752  aatmax = dmax1( aatmax, work(n+p) )
753  IF (work(n+p) .NE. zero) aatmin = dmin1(aatmin,work(n+p))
754  1950 CONTINUE
755  ELSE
756  DO 1904 p = 1, m
757  work(m+n+p) = scalem*dabs( a(p,idamax(n,a(p,1),lda)) )
758  aatmax = dmax1( aatmax, work(m+n+p) )
759  aatmin = dmin1( aatmin, work(m+n+p) )
760  1904 CONTINUE
761  END IF
762 *
763  END IF
764 *
765 * For square matrix A try to determine whether A^t would be better
766 * input for the preconditioned Jacobi SVD, with faster convergence.
767 * The decision is based on an O(N) function of the vector of column
768 * and row norms of A, based on the Shannon entropy. This should give
769 * the right choice in most cases when the difference actually matters.
770 * It may fail and pick the slower converging side.
771 *
772  entra = zero
773  entrat = zero
774  IF ( l2tran ) THEN
775 *
776  xsc = zero
777  temp1 = one
778  CALL dlassq( n, sva, 1, xsc, temp1 )
779  temp1 = one / temp1
780 *
781  entra = zero
782  DO 1113 p = 1, n
783  big1 = ( ( sva(p) / xsc )**2 ) * temp1
784  IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
785  1113 CONTINUE
786  entra = - entra / dlog(dble(n))
787 *
788 * Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
789 * It is derived from the diagonal of A^t * A. Do the same with the
790 * diagonal of A * A^t, compute the entropy of the corresponding
791 * probability distribution. Note that A * A^t and A^t * A have the
792 * same trace.
793 *
794  entrat = zero
795  DO 1114 p = n+1, n+m
796  big1 = ( ( work(p) / xsc )**2 ) * temp1
797  IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
798  1114 CONTINUE
799  entrat = - entrat / dlog(dble(m))
800 *
801 * Analyze the entropies and decide A or A^t. Smaller entropy
802 * usually means better input for the algorithm.
803 *
804  transp = ( entrat .LT. entra )
805 *
806 * If A^t is better than A, transpose A.
807 *
808  IF ( transp ) THEN
809 * In an optimal implementation, this trivial transpose
810 * should be replaced with faster transpose.
811  DO 1115 p = 1, n - 1
812  DO 1116 q = p + 1, n
813  temp1 = a(q,p)
814  a(q,p) = a(p,q)
815  a(p,q) = temp1
816  1116 CONTINUE
817  1115 CONTINUE
818  DO 1117 p = 1, n
819  work(m+n+p) = sva(p)
820  sva(p) = work(n+p)
821  1117 CONTINUE
822  temp1 = aapp
823  aapp = aatmax
824  aatmax = temp1
825  temp1 = aaqq
826  aaqq = aatmin
827  aatmin = temp1
828  kill = lsvec
829  lsvec = rsvec
830  rsvec = kill
831  IF ( lsvec ) n1 = n
832 *
833  rowpiv = .true.
834  END IF
835 *
836  END IF
837 * END IF L2TRAN
838 *
839 * Scale the matrix so that its maximal singular value remains less
840 * than DSQRT(BIG) -- the matrix is scaled so that its maximal column
841 * has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
842 * DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
843 * BLAS routines that, in some implementations, are not capable of
844 * working in the full interval [SFMIN,BIG] and that they may provoke
845 * overflows in the intermediate results. If the singular values spread
846 * from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
847 * one should use DGESVJ instead of DGEJSV.
848 *
849  big1 = dsqrt( big )
850  temp1 = dsqrt( big / dble(n) )
851 *
852  CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
853  IF ( aaqq .GT. (aapp * sfmin) ) THEN
854  aaqq = ( aaqq / aapp ) * temp1
855  ELSE
856  aaqq = ( aaqq * temp1 ) / aapp
857  END IF
858  temp1 = temp1 * scalem
859  CALL dlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
860 *
861 * To undo scaling at the end of this procedure, multiply the
862 * computed singular values with USCAL2 / USCAL1.
863 *
864  uscal1 = temp1
865  uscal2 = aapp
866 *
867  IF ( l2kill ) THEN
868 * L2KILL enforces computation of nonzero singular values in
869 * the restricted range of condition number of the initial A,
870 * sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
871  xsc = dsqrt( sfmin )
872  ELSE
873  xsc = small
874 *
875 * Now, if the condition number of A is too big,
876 * sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
877 * as a precaution measure, the full SVD is computed using DGESVJ
878 * with accumulated Jacobi rotations. This provides numerically
879 * more robust computation, at the cost of slightly increased run
880 * time. Depending on the concrete implementation of BLAS and LAPACK
881 * (i.e. how they behave in presence of extreme ill-conditioning) the
882 * implementor may decide to remove this switch.
883  IF ( ( aaqq.LT.dsqrt(sfmin) ) .AND. lsvec .AND. rsvec ) THEN
884  jracc = .true.
885  END IF
886 *
887  END IF
888  IF ( aaqq .LT. xsc ) THEN
889  DO 700 p = 1, n
890  IF ( sva(p) .LT. xsc ) THEN
891  CALL dlaset( 'A', m, 1, zero, zero, a(1,p), lda )
892  sva(p) = zero
893  END IF
894  700 CONTINUE
895  END IF
896 *
897 * Preconditioning using QR factorization with pivoting
898 *
899  IF ( rowpiv ) THEN
900 * Optional row permutation (Bjoerck row pivoting):
901 * A result by Cox and Higham shows that the Bjoerck's
902 * row pivoting combined with standard column pivoting
903 * has similar effect as Powell-Reid complete pivoting.
904 * The ell-infinity norms of A are made nonincreasing.
905  DO 1952 p = 1, m - 1
906  q = idamax( m-p+1, work(m+n+p), 1 ) + p - 1
907  iwork(2*n+p) = q
908  IF ( p .NE. q ) THEN
909  temp1 = work(m+n+p)
910  work(m+n+p) = work(m+n+q)
911  work(m+n+q) = temp1
912  END IF
913  1952 CONTINUE
914  CALL dlaswp( n, a, lda, 1, m-1, iwork(2*n+1), 1 )
915  END IF
916 *
917 * End of the preparation phase (scaling, optional sorting and
918 * transposing, optional flushing of small columns).
919 *
920 * Preconditioning
921 *
922 * If the full SVD is needed, the right singular vectors are computed
923 * from a matrix equation, and for that we need theoretical analysis
924 * of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
925 * In all other cases the first RR QRF can be chosen by other criteria
926 * (eg speed by replacing global with restricted window pivoting, such
927 * as in SGEQPX from TOMS # 782). Good results will be obtained using
928 * SGEQPX with properly (!) chosen numerical parameters.
929 * Any improvement of DGEQP3 improves overal performance of DGEJSV.
930 *
931 * A * P1 = Q1 * [ R1^t 0]^t:
932  DO 1963 p = 1, n
933 * .. all columns are free columns
934  iwork(p) = 0
935  1963 CONTINUE
936  CALL dgeqp3( m,n,a,lda, iwork,work, work(n+1),lwork-n, ierr )
937 *
938 * The upper triangular matrix R1 from the first QRF is inspected for
939 * rank deficiency and possibilities for deflation, or possible
940 * ill-conditioning. Depending on the user specified flag L2RANK,
941 * the procedure explores possibilities to reduce the numerical
942 * rank by inspecting the computed upper triangular factor. If
943 * L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
944 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
945 *
946  nr = 1
947  IF ( l2aber ) THEN
948 * Standard absolute error bound suffices. All sigma_i with
949 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
950 * agressive enforcement of lower numerical rank by introducing a
951 * backward error of the order of N*EPSLN*||A||.
952  temp1 = dsqrt(dble(n))*epsln
953  DO 3001 p = 2, n
954  IF ( dabs(a(p,p)) .GE. (temp1*dabs(a(1,1))) ) THEN
955  nr = nr + 1
956  ELSE
957  GO TO 3002
958  END IF
959  3001 CONTINUE
960  3002 CONTINUE
961  ELSE IF ( l2rank ) THEN
962 * .. similarly as above, only slightly more gentle (less agressive).
963 * Sudden drop on the diagonal of R1 is used as the criterion for
964 * close-to-rank-defficient.
965  temp1 = dsqrt(sfmin)
966  DO 3401 p = 2, n
967  IF ( ( dabs(a(p,p)) .LT. (epsln*dabs(a(p-1,p-1))) ) .OR.
968  $ ( dabs(a(p,p)) .LT. small ) .OR.
969  $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3402
970  nr = nr + 1
971  3401 CONTINUE
972  3402 CONTINUE
973 *
974  ELSE
975 * The goal is high relative accuracy. However, if the matrix
976 * has high scaled condition number the relative accuracy is in
977 * general not feasible. Later on, a condition number estimator
978 * will be deployed to estimate the scaled condition number.
979 * Here we just remove the underflowed part of the triangular
980 * factor. This prevents the situation in which the code is
981 * working hard to get the accuracy not warranted by the data.
982  temp1 = dsqrt(sfmin)
983  DO 3301 p = 2, n
984  IF ( ( dabs(a(p,p)) .LT. small ) .OR.
985  $ ( l2kill .AND. (dabs(a(p,p)) .LT. temp1) ) ) GO TO 3302
986  nr = nr + 1
987  3301 CONTINUE
988  3302 CONTINUE
989 *
990  END IF
991 *
992  almort = .false.
993  IF ( nr .EQ. n ) THEN
994  maxprj = one
995  DO 3051 p = 2, n
996  temp1 = dabs(a(p,p)) / sva(iwork(p))
997  maxprj = dmin1( maxprj, temp1 )
998  3051 CONTINUE
999  IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1000  END IF
1001 *
1002 *
1003  sconda = - one
1004  condr1 = - one
1005  condr2 = - one
1006 *
1007  IF ( errest ) THEN
1008  IF ( n .EQ. nr ) THEN
1009  IF ( rsvec ) THEN
1010 * .. V is available as workspace
1011  CALL dlacpy( 'U', n, n, a, lda, v, ldv )
1012  DO 3053 p = 1, n
1013  temp1 = sva(iwork(p))
1014  CALL dscal( p, one/temp1, v(1,p), 1 )
1015  3053 CONTINUE
1016  CALL dpocon( 'U', n, v, ldv, one, temp1,
1017  $ work(n+1), iwork(2*n+m+1), ierr )
1018  ELSE IF ( lsvec ) THEN
1019 * .. U is available as workspace
1020  CALL dlacpy( 'U', n, n, a, lda, u, ldu )
1021  DO 3054 p = 1, n
1022  temp1 = sva(iwork(p))
1023  CALL dscal( p, one/temp1, u(1,p), 1 )
1024  3054 CONTINUE
1025  CALL dpocon( 'U', n, u, ldu, one, temp1,
1026  $ work(n+1), iwork(2*n+m+1), ierr )
1027  ELSE
1028  CALL dlacpy( 'U', n, n, a, lda, work(n+1), n )
1029  DO 3052 p = 1, n
1030  temp1 = sva(iwork(p))
1031  CALL dscal( p, one/temp1, work(n+(p-1)*n+1), 1 )
1032  3052 CONTINUE
1033 * .. the columns of R are scaled to have unit Euclidean lengths.
1034  CALL dpocon( 'U', n, work(n+1), n, one, temp1,
1035  $ work(n+n*n+1), iwork(2*n+m+1), ierr )
1036  END IF
1037  sconda = one / dsqrt(temp1)
1038 * SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
1039 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1040  ELSE
1041  sconda = - one
1042  END IF
1043  END IF
1044 *
1045  l2pert = l2pert .AND. ( dabs( a(1,1)/a(nr,nr) ) .GT. dsqrt(big1) )
1046 * If there is no violent scaling, artificial perturbation is not needed.
1047 *
1048 * Phase 3:
1049 *
1050  IF ( .NOT. ( rsvec .OR. lsvec ) ) THEN
1051 *
1052 * Singular Values only
1053 *
1054 * .. transpose A(1:NR,1:N)
1055  DO 1946 p = 1, min0( n-1, nr )
1056  CALL dcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1057  1946 CONTINUE
1058 *
1059 * The following two DO-loops introduce small relative perturbation
1060 * into the strict upper triangle of the lower triangular matrix.
1061 * Small entries below the main diagonal are also changed.
1062 * This modification is useful if the computing environment does not
1063 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1064 * annoying denormalized numbers in case of strongly scaled matrices.
1065 * The perturbation is structured so that it does not introduce any
1066 * new perturbation of the singular values, and it does not destroy
1067 * the job done by the preconditioner.
1068 * The licence for this perturbation is in the variable L2PERT, which
1069 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1070 *
1071  IF ( .NOT. almort ) THEN
1072 *
1073  IF ( l2pert ) THEN
1074 * XSC = DSQRT(SMALL)
1075  xsc = epsln / dble(n)
1076  DO 4947 q = 1, nr
1077  temp1 = xsc*dabs(a(q,q))
1078  DO 4949 p = 1, n
1079  IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1080  $ .OR. ( p .LT. q ) )
1081  $ a(p,q) = dsign( temp1, a(p,q) )
1082  4949 CONTINUE
1083  4947 CONTINUE
1084  ELSE
1085  CALL dlaset( 'U', nr-1,nr-1, zero,zero, a(1,2),lda )
1086  END IF
1087 *
1088 * .. second preconditioning using the QR factorization
1089 *
1090  CALL dgeqrf( n,nr, a,lda, work, work(n+1),lwork-n, ierr )
1091 *
1092 * .. and transpose upper to lower triangular
1093  DO 1948 p = 1, nr - 1
1094  CALL dcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1095  1948 CONTINUE
1096 *
1097  END IF
1098 *
1099 * Row-cyclic Jacobi SVD algorithm with column pivoting
1100 *
1101 * .. again some perturbation (a "background noise") is added
1102 * to drown denormals
1103  IF ( l2pert ) THEN
1104 * XSC = DSQRT(SMALL)
1105  xsc = epsln / dble(n)
1106  DO 1947 q = 1, nr
1107  temp1 = xsc*dabs(a(q,q))
1108  DO 1949 p = 1, nr
1109  IF ( ( (p.GT.q) .AND. (dabs(a(p,q)).LE.temp1) )
1110  $ .OR. ( p .LT. q ) )
1111  $ a(p,q) = dsign( temp1, a(p,q) )
1112  1949 CONTINUE
1113  1947 CONTINUE
1114  ELSE
1115  CALL dlaset( 'U', nr-1, nr-1, zero, zero, a(1,2), lda )
1116  END IF
1117 *
1118 * .. and one-sided Jacobi rotations are started on a lower
1119 * triangular matrix (plus perturbation which is ignored in
1120 * the part which destroys triangular form (confusing?!))
1121 *
1122  CALL dgesvj( 'L', 'NoU', 'NoV', nr, nr, a, lda, sva,
1123  $ n, v, ldv, work, lwork, info )
1124 *
1125  scalem = work(1)
1126  numrank = idnint(work(2))
1127 *
1128 *
1129  ELSE IF ( rsvec .AND. ( .NOT. lsvec ) ) THEN
1130 *
1131 * -> Singular Values and Right Singular Vectors <-
1132 *
1133  IF ( almort ) THEN
1134 *
1135 * .. in this case NR equals N
1136  DO 1998 p = 1, nr
1137  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1138  1998 CONTINUE
1139  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1140 *
1141  CALL dgesvj( 'L','U','N', n, nr, v,ldv, sva, nr, a,lda,
1142  $ work, lwork, info )
1143  scalem = work(1)
1144  numrank = idnint(work(2))
1145 
1146  ELSE
1147 *
1148 * .. two more QR factorizations ( one QRF is not enough, two require
1149 * accumulated product of Jacobi rotations, three are perfect )
1150 *
1151  CALL dlaset( 'Lower', nr-1, nr-1, zero, zero, a(2,1), lda )
1152  CALL dgelqf( nr, n, a, lda, work, work(n+1), lwork-n, ierr)
1153  CALL dlacpy( 'Lower', nr, nr, a, lda, v, ldv )
1154  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1155  CALL dgeqrf( nr, nr, v, ldv, work(n+1), work(2*n+1),
1156  $ lwork-2*n, ierr )
1157  DO 8998 p = 1, nr
1158  CALL dcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1159  8998 CONTINUE
1160  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, v(1,2), ldv )
1161 *
1162  CALL dgesvj( 'Lower', 'U','N', nr, nr, v,ldv, sva, nr, u,
1163  $ ldu, work(n+1), lwork, info )
1164  scalem = work(n+1)
1165  numrank = idnint(work(n+2))
1166  IF ( nr .LT. n ) THEN
1167  CALL dlaset( 'A',n-nr, nr, zero,zero, v(nr+1,1), ldv )
1168  CALL dlaset( 'A',nr, n-nr, zero,zero, v(1,nr+1), ldv )
1169  CALL dlaset( 'A',n-nr,n-nr,zero,one, v(nr+1,nr+1), ldv )
1170  END IF
1171 *
1172  CALL dormlq( 'Left', 'Transpose', n, n, nr, a, lda, work,
1173  $ v, ldv, work(n+1), lwork-n, ierr )
1174 *
1175  END IF
1176 *
1177  DO 8991 p = 1, n
1178  CALL dcopy( n, v(p,1), ldv, a(iwork(p),1), lda )
1179  8991 CONTINUE
1180  CALL dlacpy( 'All', n, n, a, lda, v, ldv )
1181 *
1182  IF ( transp ) THEN
1183  CALL dlacpy( 'All', n, n, v, ldv, u, ldu )
1184  END IF
1185 *
1186  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) THEN
1187 *
1188 * .. Singular Values and Left Singular Vectors ..
1189 *
1190 * .. second preconditioning step to avoid need to accumulate
1191 * Jacobi rotations in the Jacobi iterations.
1192  DO 1965 p = 1, nr
1193  CALL dcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1194  1965 CONTINUE
1195  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1196 *
1197  CALL dgeqrf( n, nr, u, ldu, work(n+1), work(2*n+1),
1198  $ lwork-2*n, ierr )
1199 *
1200  DO 1967 p = 1, nr - 1
1201  CALL dcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1202  1967 CONTINUE
1203  CALL dlaset( 'Upper', nr-1, nr-1, zero, zero, u(1,2), ldu )
1204 *
1205  CALL dgesvj( 'Lower', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1206  $ lda, work(n+1), lwork-n, info )
1207  scalem = work(n+1)
1208  numrank = idnint(work(n+2))
1209 *
1210  IF ( nr .LT. m ) THEN
1211  CALL dlaset( 'A', m-nr, nr,zero, zero, u(nr+1,1), ldu )
1212  IF ( nr .LT. n1 ) THEN
1213  CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1), ldu )
1214  CALL dlaset( 'A',m-nr,n1-nr,zero,one,u(nr+1,nr+1), ldu )
1215  END IF
1216  END IF
1217 *
1218  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1219  $ ldu, work(n+1), lwork-n, ierr )
1220 *
1221  IF ( rowpiv )
1222  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1223 *
1224  DO 1974 p = 1, n1
1225  xsc = one / dnrm2( m, u(1,p), 1 )
1226  CALL dscal( m, xsc, u(1,p), 1 )
1227  1974 CONTINUE
1228 *
1229  IF ( transp ) THEN
1230  CALL dlacpy( 'All', n, n, u, ldu, v, ldv )
1231  END IF
1232 *
1233  ELSE
1234 *
1235 * .. Full SVD ..
1236 *
1237  IF ( .NOT. jracc ) THEN
1238 *
1239  IF ( .NOT. almort ) THEN
1240 *
1241 * Second Preconditioning Step (QRF [with pivoting])
1242 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1243 * equivalent to an LQF CALL. Since in many libraries the QRF
1244 * seems to be better optimized than the LQF, we do explicit
1245 * transpose and use the QRF. This is subject to changes in an
1246 * optimized implementation of DGEJSV.
1247 *
1248  DO 1968 p = 1, nr
1249  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1250  1968 CONTINUE
1251 *
1252 * .. the following two loops perturb small entries to avoid
1253 * denormals in the second QR factorization, where they are
1254 * as good as zeros. This is done to avoid painfully slow
1255 * computation with denormals. The relative size of the perturbation
1256 * is a parameter that can be changed by the implementer.
1257 * This perturbation device will be obsolete on machines with
1258 * properly implemented arithmetic.
1259 * To switch it off, set L2PERT=.FALSE. To remove it from the
1260 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1261 * The following two loops should be blocked and fused with the
1262 * transposed copy above.
1263 *
1264  IF ( l2pert ) THEN
1265  xsc = dsqrt(small)
1266  DO 2969 q = 1, nr
1267  temp1 = xsc*dabs( v(q,q) )
1268  DO 2968 p = 1, n
1269  IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1270  $ .OR. ( p .LT. q ) )
1271  $ v(p,q) = dsign( temp1, v(p,q) )
1272  IF ( p .LT. q ) v(p,q) = - v(p,q)
1273  2968 CONTINUE
1274  2969 CONTINUE
1275  ELSE
1276  CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1277  END IF
1278 *
1279 * Estimate the row scaled condition number of R1
1280 * (If R1 is rectangular, N > NR, then the condition number
1281 * of the leading NR x NR submatrix is estimated.)
1282 *
1283  CALL dlacpy( 'L', nr, nr, v, ldv, work(2*n+1), nr )
1284  DO 3950 p = 1, nr
1285  temp1 = dnrm2(nr-p+1,work(2*n+(p-1)*nr+p),1)
1286  CALL dscal(nr-p+1,one/temp1,work(2*n+(p-1)*nr+p),1)
1287  3950 CONTINUE
1288  CALL dpocon('Lower',nr,work(2*n+1),nr,one,temp1,
1289  $ work(2*n+nr*nr+1),iwork(m+2*n+1),ierr)
1290  condr1 = one / dsqrt(temp1)
1291 * .. here need a second oppinion on the condition number
1292 * .. then assume worst case scenario
1293 * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1294 * more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1295 *
1296  cond_ok = dsqrt(dble(nr))
1297 *[TP] COND_OK is a tuning parameter.
1298 
1299  IF ( condr1 .LT. cond_ok ) THEN
1300 * .. the second QRF without pivoting. Note: in an optimized
1301 * implementation, this QRF should be implemented as the QRF
1302 * of a lower triangular matrix.
1303 * R1^t = Q2 * R2
1304  CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1305  $ lwork-2*n, ierr )
1306 *
1307  IF ( l2pert ) THEN
1308  xsc = dsqrt(small)/epsln
1309  DO 3959 p = 2, nr
1310  DO 3958 q = 1, p - 1
1311  temp1 = xsc * dmin1(dabs(v(p,p)),dabs(v(q,q)))
1312  IF ( dabs(v(q,p)) .LE. temp1 )
1313  $ v(q,p) = dsign( temp1, v(q,p) )
1314  3958 CONTINUE
1315  3959 CONTINUE
1316  END IF
1317 *
1318  IF ( nr .NE. n )
1319  $ CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1320 * .. save ...
1321 *
1322 * .. this transposed copy should be better than naive
1323  DO 1969 p = 1, nr - 1
1324  CALL dcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1325  1969 CONTINUE
1326 *
1327  condr2 = condr1
1328 *
1329  ELSE
1330 *
1331 * .. ill-conditioned case: second QRF with pivoting
1332 * Note that windowed pivoting would be equaly good
1333 * numerically, and more run-time efficient. So, in
1334 * an optimal implementation, the next call to DGEQP3
1335 * should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1336 * with properly (carefully) chosen parameters.
1337 *
1338 * R1^t * P2 = Q2 * R2
1339  DO 3003 p = 1, nr
1340  iwork(n+p) = 0
1341  3003 CONTINUE
1342  CALL dgeqp3( n, nr, v, ldv, iwork(n+1), work(n+1),
1343  $ work(2*n+1), lwork-2*n, ierr )
1344 ** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1345 ** $ LWORK-2*N, IERR )
1346  IF ( l2pert ) THEN
1347  xsc = dsqrt(small)
1348  DO 3969 p = 2, nr
1349  DO 3968 q = 1, p - 1
1350  temp1 = xsc * dmin1(dabs(v(p,p)),dabs(v(q,q)))
1351  IF ( dabs(v(q,p)) .LE. temp1 )
1352  $ v(q,p) = dsign( temp1, v(q,p) )
1353  3968 CONTINUE
1354  3969 CONTINUE
1355  END IF
1356 *
1357  CALL dlacpy( 'A', n, nr, v, ldv, work(2*n+1), n )
1358 *
1359  IF ( l2pert ) THEN
1360  xsc = dsqrt(small)
1361  DO 8970 p = 2, nr
1362  DO 8971 q = 1, p - 1
1363  temp1 = xsc * dmin1(dabs(v(p,p)),dabs(v(q,q)))
1364  v(p,q) = - dsign( temp1, v(q,p) )
1365  8971 CONTINUE
1366  8970 CONTINUE
1367  ELSE
1368  CALL dlaset( 'L',nr-1,nr-1,zero,zero,v(2,1),ldv )
1369  END IF
1370 * Now, compute R2 = L3 * Q3, the LQ factorization.
1371  CALL dgelqf( nr, nr, v, ldv, work(2*n+n*nr+1),
1372  $ work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1373 * .. and estimate the condition number
1374  CALL dlacpy( 'L',nr,nr,v,ldv,work(2*n+n*nr+nr+1),nr )
1375  DO 4950 p = 1, nr
1376  temp1 = dnrm2( p, work(2*n+n*nr+nr+p), nr )
1377  CALL dscal( p, one/temp1, work(2*n+n*nr+nr+p), nr )
1378  4950 CONTINUE
1379  CALL dpocon( 'L',nr,work(2*n+n*nr+nr+1),nr,one,temp1,
1380  $ work(2*n+n*nr+nr+nr*nr+1),iwork(m+2*n+1),ierr )
1381  condr2 = one / dsqrt(temp1)
1382 *
1383  IF ( condr2 .GE. cond_ok ) THEN
1384 * .. save the Householder vectors used for Q3
1385 * (this overwrittes the copy of R2, as it will not be
1386 * needed in this branch, but it does not overwritte the
1387 * Huseholder vectors of Q2.).
1388  CALL dlacpy( 'U', nr, nr, v, ldv, work(2*n+1), n )
1389 * .. and the rest of the information on Q3 is in
1390 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1391  END IF
1392 *
1393  END IF
1394 *
1395  IF ( l2pert ) THEN
1396  xsc = dsqrt(small)
1397  DO 4968 q = 2, nr
1398  temp1 = xsc * v(q,q)
1399  DO 4969 p = 1, q - 1
1400 * V(p,q) = - DSIGN( TEMP1, V(q,p) )
1401  v(p,q) = - dsign( temp1, v(p,q) )
1402  4969 CONTINUE
1403  4968 CONTINUE
1404  ELSE
1405  CALL dlaset( 'U', nr-1,nr-1, zero,zero, v(1,2), ldv )
1406  END IF
1407 *
1408 * Second preconditioning finished; continue with Jacobi SVD
1409 * The input matrix is lower trinagular.
1410 *
1411 * Recover the right singular vectors as solution of a well
1412 * conditioned triangular matrix equation.
1413 *
1414  IF ( condr1 .LT. cond_ok ) THEN
1415 *
1416  CALL dgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u,
1417  $ ldu,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,info )
1418  scalem = work(2*n+n*nr+nr+1)
1419  numrank = idnint(work(2*n+n*nr+nr+2))
1420  DO 3970 p = 1, nr
1421  CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1422  CALL dscal( nr, sva(p), v(1,p), 1 )
1423  3970 CONTINUE
1424 
1425 * .. pick the right matrix equation and solve it
1426 *
1427  IF ( nr .EQ. n ) THEN
1428 * :)) .. best case, R1 is inverted. The solution of this matrix
1429 * equation is Q2*V2 = the product of the Jacobi rotations
1430 * used in DGESVJ, premultiplied with the orthogonal matrix
1431 * from the second QR factorization.
1432  CALL dtrsm( 'L','U','N','N', nr,nr,one, a,lda, v,ldv )
1433  ELSE
1434 * .. R1 is well conditioned, but non-square. Transpose(R2)
1435 * is inverted to get the product of the Jacobi rotations
1436 * used in DGESVJ. The Q-factor from the second QR
1437 * factorization is then built in explicitly.
1438  CALL dtrsm('L','U','T','N',nr,nr,one,work(2*n+1),
1439  $ n,v,ldv)
1440  IF ( nr .LT. n ) THEN
1441  CALL dlaset('A',n-nr,nr,zero,zero,v(nr+1,1),ldv)
1442  CALL dlaset('A',nr,n-nr,zero,zero,v(1,nr+1),ldv)
1443  CALL dlaset('A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv)
1444  END IF
1445  CALL dormqr('L','N',n,n,nr,work(2*n+1),n,work(n+1),
1446  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1447  END IF
1448 *
1449  ELSE IF ( condr2 .LT. cond_ok ) THEN
1450 *
1451 * :) .. the input matrix A is very likely a relative of
1452 * the Kahan matrix :)
1453 * The matrix R2 is inverted. The solution of the matrix equation
1454 * is Q3^T*V3 = the product of the Jacobi rotations (appplied to
1455 * the lower triangular L3 from the LQ factorization of
1456 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1457  CALL dgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1458  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1459  scalem = work(2*n+n*nr+nr+1)
1460  numrank = idnint(work(2*n+n*nr+nr+2))
1461  DO 3870 p = 1, nr
1462  CALL dcopy( nr, v(1,p), 1, u(1,p), 1 )
1463  CALL dscal( nr, sva(p), u(1,p), 1 )
1464  3870 CONTINUE
1465  CALL dtrsm('L','U','N','N',nr,nr,one,work(2*n+1),n,u,ldu)
1466 * .. apply the permutation from the second QR factorization
1467  DO 873 q = 1, nr
1468  DO 872 p = 1, nr
1469  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1470  872 CONTINUE
1471  DO 874 p = 1, nr
1472  u(p,q) = work(2*n+n*nr+nr+p)
1473  874 CONTINUE
1474  873 CONTINUE
1475  IF ( nr .LT. n ) THEN
1476  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1477  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1478  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1479  END IF
1480  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1481  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1482  ELSE
1483 * Last line of defense.
1484 * #:( This is a rather pathological case: no scaled condition
1485 * improvement after two pivoted QR factorizations. Other
1486 * possibility is that the rank revealing QR factorization
1487 * or the condition estimator has failed, or the COND_OK
1488 * is set very close to ONE (which is unnecessary). Normally,
1489 * this branch should never be executed, but in rare cases of
1490 * failure of the RRQR or condition estimator, the last line of
1491 * defense ensures that DGEJSV completes the task.
1492 * Compute the full SVD of L3 using DGESVJ with explicit
1493 * accumulation of Jacobi rotations.
1494  CALL dgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1495  $ ldu, work(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, info )
1496  scalem = work(2*n+n*nr+nr+1)
1497  numrank = idnint(work(2*n+n*nr+nr+2))
1498  IF ( nr .LT. n ) THEN
1499  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1500  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1501  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1502  END IF
1503  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1504  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1505 *
1506  CALL dormlq( 'L', 'T', nr, nr, nr, work(2*n+1), n,
1507  $ work(2*n+n*nr+1), u, ldu, work(2*n+n*nr+nr+1),
1508  $ lwork-2*n-n*nr-nr, ierr )
1509  DO 773 q = 1, nr
1510  DO 772 p = 1, nr
1511  work(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1512  772 CONTINUE
1513  DO 774 p = 1, nr
1514  u(p,q) = work(2*n+n*nr+nr+p)
1515  774 CONTINUE
1516  773 CONTINUE
1517 *
1518  END IF
1519 *
1520 * Permute the rows of V using the (column) permutation from the
1521 * first QRF. Also, scale the columns to make them unit in
1522 * Euclidean norm. This applies to all cases.
1523 *
1524  temp1 = dsqrt(dble(n)) * epsln
1525  DO 1972 q = 1, n
1526  DO 972 p = 1, n
1527  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1528  972 CONTINUE
1529  DO 973 p = 1, n
1530  v(p,q) = work(2*n+n*nr+nr+p)
1531  973 CONTINUE
1532  xsc = one / dnrm2( n, v(1,q), 1 )
1533  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1534  $ CALL dscal( n, xsc, v(1,q), 1 )
1535  1972 CONTINUE
1536 * At this moment, V contains the right singular vectors of A.
1537 * Next, assemble the left singular vector matrix U (M x N).
1538  IF ( nr .LT. m ) THEN
1539  CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1540  IF ( nr .LT. n1 ) THEN
1541  CALL dlaset('A',nr,n1-nr,zero,zero,u(1,nr+1),ldu)
1542  CALL dlaset('A',m-nr,n1-nr,zero,one,u(nr+1,nr+1),ldu)
1543  END IF
1544  END IF
1545 *
1546 * The Q matrix from the first QRF is built into the left singular
1547 * matrix U. This applies to all cases.
1548 *
1549  CALL dormqr( 'Left', 'No_Tr', m, n1, n, a, lda, work, u,
1550  $ ldu, work(n+1), lwork-n, ierr )
1551 
1552 * The columns of U are normalized. The cost is O(M*N) flops.
1553  temp1 = dsqrt(dble(m)) * epsln
1554  DO 1973 p = 1, nr
1555  xsc = one / dnrm2( m, u(1,p), 1 )
1556  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1557  $ CALL dscal( m, xsc, u(1,p), 1 )
1558  1973 CONTINUE
1559 *
1560 * If the initial QRF is computed with row pivoting, the left
1561 * singular vectors must be adjusted.
1562 *
1563  IF ( rowpiv )
1564  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1565 *
1566  ELSE
1567 *
1568 * .. the initial matrix A has almost orthogonal columns and
1569 * the second QRF is not needed
1570 *
1571  CALL dlacpy( 'Upper', n, n, a, lda, work(n+1), n )
1572  IF ( l2pert ) THEN
1573  xsc = dsqrt(small)
1574  DO 5970 p = 2, n
1575  temp1 = xsc * work( n + (p-1)*n + p )
1576  DO 5971 q = 1, p - 1
1577  work(n+(q-1)*n+p)=-dsign(temp1,work(n+(p-1)*n+q))
1578  5971 CONTINUE
1579  5970 CONTINUE
1580  ELSE
1581  CALL dlaset( 'Lower',n-1,n-1,zero,zero,work(n+2),n )
1582  END IF
1583 *
1584  CALL dgesvj( 'Upper', 'U', 'N', n, n, work(n+1), n, sva,
1585  $ n, u, ldu, work(n+n*n+1), lwork-n-n*n, info )
1586 *
1587  scalem = work(n+n*n+1)
1588  numrank = idnint(work(n+n*n+2))
1589  DO 6970 p = 1, n
1590  CALL dcopy( n, work(n+(p-1)*n+1), 1, u(1,p), 1 )
1591  CALL dscal( n, sva(p), work(n+(p-1)*n+1), 1 )
1592  6970 CONTINUE
1593 *
1594  CALL dtrsm( 'Left', 'Upper', 'NoTrans', 'No UD', n, n,
1595  $ one, a, lda, work(n+1), n )
1596  DO 6972 p = 1, n
1597  CALL dcopy( n, work(n+p), n, v(iwork(p),1), ldv )
1598  6972 CONTINUE
1599  temp1 = dsqrt(dble(n))*epsln
1600  DO 6971 p = 1, n
1601  xsc = one / dnrm2( n, v(1,p), 1 )
1602  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1603  $ CALL dscal( n, xsc, v(1,p), 1 )
1604  6971 CONTINUE
1605 *
1606 * Assemble the left singular vector matrix U (M x N).
1607 *
1608  IF ( n .LT. m ) THEN
1609  CALL dlaset( 'A', m-n, n, zero, zero, u(n+1,1), ldu )
1610  IF ( n .LT. n1 ) THEN
1611  CALL dlaset( 'A',n, n1-n, zero, zero, u(1,n+1),ldu )
1612  CALL dlaset( 'A',m-n,n1-n, zero, one,u(n+1,n+1),ldu )
1613  END IF
1614  END IF
1615  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1616  $ ldu, work(n+1), lwork-n, ierr )
1617  temp1 = dsqrt(dble(m))*epsln
1618  DO 6973 p = 1, n1
1619  xsc = one / dnrm2( m, u(1,p), 1 )
1620  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1621  $ CALL dscal( m, xsc, u(1,p), 1 )
1622  6973 CONTINUE
1623 *
1624  IF ( rowpiv )
1625  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1626 *
1627  END IF
1628 *
1629 * end of the >> almost orthogonal case << in the full SVD
1630 *
1631  ELSE
1632 *
1633 * This branch deploys a preconditioned Jacobi SVD with explicitly
1634 * accumulated rotations. It is included as optional, mainly for
1635 * experimental purposes. It does perfom well, and can also be used.
1636 * In this implementation, this branch will be automatically activated
1637 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1638 * to be greater than the overflow threshold. This is because the
1639 * a posteriori computation of the singular vectors assumes robust
1640 * implementation of BLAS and some LAPACK procedures, capable of working
1641 * in presence of extreme values. Since that is not always the case, ...
1642 *
1643  DO 7968 p = 1, nr
1644  CALL dcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1645  7968 CONTINUE
1646 *
1647  IF ( l2pert ) THEN
1648  xsc = dsqrt(small/epsln)
1649  DO 5969 q = 1, nr
1650  temp1 = xsc*dabs( v(q,q) )
1651  DO 5968 p = 1, n
1652  IF ( ( p .GT. q ) .AND. ( dabs(v(p,q)) .LE. temp1 )
1653  $ .OR. ( p .LT. q ) )
1654  $ v(p,q) = dsign( temp1, v(p,q) )
1655  IF ( p .LT. q ) v(p,q) = - v(p,q)
1656  5968 CONTINUE
1657  5969 CONTINUE
1658  ELSE
1659  CALL dlaset( 'U', nr-1, nr-1, zero, zero, v(1,2), ldv )
1660  END IF
1661 
1662  CALL dgeqrf( n, nr, v, ldv, work(n+1), work(2*n+1),
1663  $ lwork-2*n, ierr )
1664  CALL dlacpy( 'L', n, nr, v, ldv, work(2*n+1), n )
1665 *
1666  DO 7969 p = 1, nr
1667  CALL dcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
1668  7969 CONTINUE
1669 
1670  IF ( l2pert ) THEN
1671  xsc = dsqrt(small/epsln)
1672  DO 9970 q = 2, nr
1673  DO 9971 p = 1, q - 1
1674  temp1 = xsc * dmin1(dabs(u(p,p)),dabs(u(q,q)))
1675  u(p,q) = - dsign( temp1, u(q,p) )
1676  9971 CONTINUE
1677  9970 CONTINUE
1678  ELSE
1679  CALL dlaset('U', nr-1, nr-1, zero, zero, u(1,2), ldu )
1680  END IF
1681 
1682  CALL dgesvj( 'G', 'U', 'V', nr, nr, u, ldu, sva,
1683  $ n, v, ldv, work(2*n+n*nr+1), lwork-2*n-n*nr, info )
1684  scalem = work(2*n+n*nr+1)
1685  numrank = idnint(work(2*n+n*nr+2))
1686 
1687  IF ( nr .LT. n ) THEN
1688  CALL dlaset( 'A',n-nr,nr,zero,zero,v(nr+1,1),ldv )
1689  CALL dlaset( 'A',nr,n-nr,zero,zero,v(1,nr+1),ldv )
1690  CALL dlaset( 'A',n-nr,n-nr,zero,one,v(nr+1,nr+1),ldv )
1691  END IF
1692 
1693  CALL dormqr( 'L','N',n,n,nr,work(2*n+1),n,work(n+1),
1694  $ v,ldv,work(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1695 *
1696 * Permute the rows of V using the (column) permutation from the
1697 * first QRF. Also, scale the columns to make them unit in
1698 * Euclidean norm. This applies to all cases.
1699 *
1700  temp1 = dsqrt(dble(n)) * epsln
1701  DO 7972 q = 1, n
1702  DO 8972 p = 1, n
1703  work(2*n+n*nr+nr+iwork(p)) = v(p,q)
1704  8972 CONTINUE
1705  DO 8973 p = 1, n
1706  v(p,q) = work(2*n+n*nr+nr+p)
1707  8973 CONTINUE
1708  xsc = one / dnrm2( n, v(1,q), 1 )
1709  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1710  $ CALL dscal( n, xsc, v(1,q), 1 )
1711  7972 CONTINUE
1712 *
1713 * At this moment, V contains the right singular vectors of A.
1714 * Next, assemble the left singular vector matrix U (M x N).
1715 *
1716  IF ( nr .LT. m ) THEN
1717  CALL dlaset( 'A', m-nr, nr, zero, zero, u(nr+1,1), ldu )
1718  IF ( nr .LT. n1 ) THEN
1719  CALL dlaset( 'A',nr, n1-nr, zero, zero, u(1,nr+1),ldu )
1720  CALL dlaset( 'A',m-nr,n1-nr, zero, one,u(nr+1,nr+1),ldu )
1721  END IF
1722  END IF
1723 *
1724  CALL dormqr( 'Left', 'No Tr', m, n1, n, a, lda, work, u,
1725  $ ldu, work(n+1), lwork-n, ierr )
1726 *
1727  IF ( rowpiv )
1728  $ CALL dlaswp( n1, u, ldu, 1, m-1, iwork(2*n+1), -1 )
1729 *
1730 *
1731  END IF
1732  IF ( transp ) THEN
1733 * .. swap U and V because the procedure worked on A^t
1734  DO 6974 p = 1, n
1735  CALL dswap( n, u(1,p), 1, v(1,p), 1 )
1736  6974 CONTINUE
1737  END IF
1738 *
1739  END IF
1740 * end of the full SVD
1741 *
1742 * Undo scaling, if necessary (and possible)
1743 *
1744  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) THEN
1745  CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
1746  uscal1 = one
1747  uscal2 = one
1748  END IF
1749 *
1750  IF ( nr .LT. n ) THEN
1751  DO 3004 p = nr+1, n
1752  sva(p) = zero
1753  3004 CONTINUE
1754  END IF
1755 *
1756  work(1) = uscal2 * scalem
1757  work(2) = uscal1
1758  IF ( errest ) work(3) = sconda
1759  IF ( lsvec .AND. rsvec ) THEN
1760  work(4) = condr1
1761  work(5) = condr2
1762  END IF
1763  IF ( l2tran ) THEN
1764  work(6) = entra
1765  work(7) = entrat
1766  END IF
1767 *
1768  iwork(1) = nr
1769  iwork(2) = numrank
1770  iwork(3) = warning
1771 *
1772  RETURN
1773 * ..
1774 * .. END OF DGEJSV
1775 * ..
subroutine dgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGELQF
Definition: dgelqf.f:137
subroutine dgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, WORK, LWORK, INFO)
DGESVJ
Definition: dgesvj.f:337
subroutine dlaswp(N, A, LDA, K1, K2, IPIV, INCX)
DLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: dlaswp.f:116
subroutine dcopy(N, DX, INCX, DY, INCY)
DCOPY
Definition: dcopy.f:53
subroutine dgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
DGEQRF
Definition: dgeqrf.f:138
subroutine dormqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMQR
Definition: dormqr.f:171
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine dormlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
DORMLQ
Definition: dormlq.f:171
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:141
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine dorgqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
DORGQR
Definition: dorgqr.f:130
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:55
subroutine dtrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
DTRSM
Definition: dtrsm.f:183
subroutine dlassq(N, X, INCX, SCALE, SUMSQ)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f:105
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:53
subroutine dlacpy(UPLO, M, N, A, LDA, B, LDB)
DLACPY copies all or part of one two-dimensional array to another.
Definition: dlacpy.f:105
double precision function dnrm2(N, X, INCX)
DNRM2
Definition: dnrm2.f:56
subroutine dswap(N, DX, INCX, DY, INCY)
DSWAP
Definition: dswap.f:53
subroutine dlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
DLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: dlaset.f:112
subroutine dpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, IWORK, INFO)
DPOCON
Definition: dpocon.f:123
subroutine dgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, INFO)
DGEQP3
Definition: dgeqp3.f:153

Here is the call graph for this function:

Here is the caller graph for this function: