Please refer to the Debian Install Documentation, Setting up a Slave DNS Server
The following are detailed elsewhere in the documentation
/var/log/dms/dmsdmd.log* dmsdmd logs /var/log/local7.log DMS named logs /var/log/syslog Basically everything /etc/dms/dms.conf dmsdmd, WSGI and zone_tool configuration file /etc/dms Various passwords, templates and things
See Named.conf and Zone Templating for more details.
Check that named, postgres, and dmsdmd are running on the master.
Using zone_tool show_dms_status on master server:
zone_tool > show_dms_status
show_master_status:
MASTER_SERVER: dms-akl
NAMED master configuration state:
hold_sg: HOLD_SG_NONE
hold_sg_name: None
hold_start: Wed Nov 7 16:52:36 2012
hold_stop: Wed Nov 7 17:02:36 2012
replica_sg_name: vygr-replica
state: HOLD
show_replica_sg:
sg_name: vygr-replica
config_dir: /etc/net24/server-config-templates
master_address: 2406:1e00:1001:1::2
master_alt_address: 2406:3e00:1001:1::2
replica_sg: True
zone_count: 37
Replica SG named status:
dms-chc 2406:3e00:1001:1::2
OK
ls_server:
dms-akl Wed Nov 7 16:52:46 2012 OK
2406:1e00:1001:1::2 None
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dms-chc Wed Nov 7 16:52:46 2012 OK
2406:3e00:1001:1::2 210.5.48.242
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dms-s1-akl Wed Nov 7 16:31:04 2012 RETRY
2406:1e00:1001:2::2 103.4.136.226
ping: 5 packets transmitted, 5 received, 0.00% packet loss
retry_msg:
Server 'dms-s1-akl': SOA query - timeout waiting for
response, retrying
dms-s1-chc Wed Nov 7 16:52:46 2012 OK
2406:3e00:1001:2::2 210.5.48.226
ping: 5 packets transmitted, 5 received, 0.00% packet loss
list_pending_events:
ServerSMConfigure dms-s1-akl Wed Nov 7 16:57:22
2012
ServerSMCheckServer dms-chc Wed Nov 7 16:53:55
2012
ServerSMCheckServer dms-akl Wed Nov 7 16:55:46
2012
ServerSMCheckServer dms-s1-chc Wed Nov 7 16:57:06
2012
MasterSMHoldTimeout Wed Nov 7 17:02:36
2012
zone_tool >
* Check Master server name, that machine is actually the master.
* Check master state, ``HOLD`` means named reconfigured in the last 10
minutes.
* All servers shown at bottom should be in ``OK`` or ``CONFIG`` states,
staying in ``RETRY`` or ``BROKEN`` means server may not be contactable.
``RETRY`` or ``BROKEN`` states should also have a ``retry_msg:`` field
giving the associated log message.
* :command:`list_pending_events` shows events that have to be processed.
* Any events that are scheduled in the past may indicate :command:`dmsdmd` having
serious problems.
On the Replica:
dms-chc: -root- [~]
# dms_promote_replica
+ perl -pe s/^#(\s*local7.* :ompgsql:\S+,dms,rsyslog,.*$)/\1/ -i
/etc/rsyslog.d/pgsql.conf
+ set +x
[ ok ] Stopping enhanced syslogd: rsyslogd.
[ ok ] Starting enhanced syslogd: rsyslogd.
+ perl -pe s/^NET24DMD_ENABLE=.*$/NET24DMD_ENABLE=true/ -i
/etc/default/net24dmd
+ perl -pe s/^OPTIONS=.*$/OPTIONS="-u bind"/ -i /etc/default/bind9
+ set +x
[....] Stopping domain name service...: bind9waiting for pid 8744 to die
. ok
[ ok ] Starting domain name service...: bind9.
[ ok ] Starting net24dmd: net24dmd.
+ zone_tool write_rndc_conf
+ zone_tool reconfig_all
+ perl -pe s/^#+(.*zone_tool vacuum_all)$/\1/ -i /etc/cron.d/dms-core
+ do_dms_wsgi
+ return 0
+ perl -pe s/^(\s*exit\s+0.*$)/#\1/ -i /etc/default/apache2
+ set +x
[ ok ] Starting web server: apache2.
dms-chc: -root- [~]
#
Wait till servers started, and then use zone_tool show_dms_status to check that everything becomes OK. This may take 15 minutes. The section about ls_pending_events will give scheduled times for servers to become configured.
dms-chc: -root- [~]
# zone_tool show_dms_status
show_master_status:
MASTER_SERVER: dms-chc
NAMED master configuration state:
hold_sg: HOLD_SG_NONE
hold_sg_name: None
hold_start: Fri Nov 9 08:30:49 2012
hold_stop: Fri Nov 9 08:40:49 2012
replica_sg_name: vygr-replica
state: HOLD
show_replica_sg:
sg_name: vygr-replica
config_dir: /etc/net24/server-config-templates
master_address: 2406:1e00:1001:1::2
master_alt_address: 2406:3e00:1001:1::2
replica_sg: True
zone_count: 37
Replica SG named status:
dms-akl 2406:1e00:1001:1::2
RETRY
ls_server:
dms-akl Fri Nov 9 08:23:08 2012 RETRY
2406:1e00:1001:1::2 None
ping: 5 packets transmitted, 5 received, 0.00% packet loss
retry_msg:
Server 'dms-akl': SOA query - timeout waiting for response,
retrying
dms-chc Fri Nov 9 08:30:58 2012 OK
2406:3e00:1001:1::2 210.5.48.242
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dms-s1-akl Fri Nov 9 08:30:58 2012 OK
2406:1e00:1001:2::2 103.4.136.226
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dms-s1-chc Fri Nov 9 08:30:58 2012 OK
2406:3e00:1001:2::2 210.5.48.226
ping: 5 packets transmitted, 5 received, 0.00% packet loss
list_pending_events:
ServerSMCheckServer dms-chc Fri Nov 9 08:39:53
2012
MasterSMHoldTimeout Fri Nov 9 08:40:49
2012
ServerSMCheckServer dms-s1-chc Fri Nov 9 08:40:08
2012
ServerSMCheckServer dms-s1-akl Fri Nov 9 08:36:01
2012
ServerSMConfigure dms-akl Fri Nov 9 08:50:17
2012
dms-chc: -root- [~]
#
A new replica will need to be installed as per DMS Master Server Install
zone_tool > show_zonesm wham-blam.org
name: wham-blam.org.
alt_sg_name: None
auto_dnssec: False
ctime: Thu Aug 23 10:51:14 2012
deleted_start: None
edit_lock: True
edit_lock_token: None
inc_updates: False
lock_state: EDIT_UNLOCK
locked_at: None
locked_by: None
mtime: Thu Aug 23 10:51:14 2012
nsec3: True
reference: nutty-nutty@ANATHOTH-NET
sg_name: anathoth-internal
soa_serial: 2012091400
state: UNCONFIG
use_apex_ns: True
zi_candidate_id: 102880
zi_id: 102880
zone_id: 101448
zone_type: DynDNSZoneSM
zi_id: 102880
change_by: grantma@shalom-ext.internal.anathoth.net/Admin
ctime: Fri Sep 14 10:55:59 2012
mtime: Fri Sep 14 11:12:10 2012
ptime: Fri Sep 14 11:12:10 2012
soa_expire: 7d
soa_minimum: 600
soa_mname: ns1.internal.anathoth.net.
soa_refresh: 24h
soa_retry: 900
soa_rname: matthewgrant5.gmail.com.
soa_serial: 2012091400
soa_ttl: None
zone_id: 101448
zone_ttl: 24h
Maybe as above. Can be caused by:
Failed events (manually failed or otherwise, Events queue deleted in DB, permissions problems as follows)
Permissions problems on the master server on the /var/lib/bind/dynamic directory - should be:
# ls -ld /var/lib/bind/dynamic/ drwxrwsr-x 2 bind dmsdmd 487424 Nov 9 08:47 /var/lib/bind/dynamic/
Do a reset_zonesm wham-blam.org, (noting y/N and DNSSEC RRSIGs being destroyed):
zone_tool > reset_zonesm wham-blam.org.
*** WARNING - doing this destroys DNSSEC RRSIG data.
*** Do really you wish to do this?
--y/[N]> y
zone_tool >
And check again:
zone_tool > show_zonesm wham-blam.org
name: wham-blam.org.
alt_sg_name: None
auto_dnssec: False
ctime: Thu Aug 23 10:51:14 2012
deleted_start: None
edit_lock: True
edit_lock_token: None
inc_updates: False
lock_state: EDIT_UNLOCK
locked_at: None
locked_by: None
mtime: Thu Aug 23 10:51:14 2012
nsec3: True
reference: nutty-nutty@ANATHOTH-NET
sg_name: anathoth-internal
soa_serial: 2012091400
state: RESET
use_apex_ns: True
zi_candidate_id: 102880
zi_id: 102880
zone_id: 101448
zone_type: DynDNSZoneSM
zi_id: 102880
change_by: grantma@shalom-ext.internal.anathoth.net/Admin
ctime: Fri Sep 14 10:55:59 2012
mtime: Fri Sep 14 11:12:10 2012
ptime: Fri Sep 14 11:12:10 2012
soa_expire: 7d
soa_minimum: 600
soa_mname: ns1.internal.anathoth.net.
soa_refresh: 24h
soa_retry: 900
soa_rname: matthewgrant5.gmail.com.
soa_serial: 2012091400
soa_ttl: None
zone_id: 101448
zone_ttl: 24h
And then use show_zonesm to check that zone state goes to PUBLISHED within 15 minutes. ls_pending_events may also be useful, as it will show the events to do with the zone being published.
show_zonesm wham-blam.org
name: wham-blam.org.
alt_sg_name: None
auto_dnssec: False
ctime: Thu Aug 23 10:51:14 2012
deleted_start: None
edit_lock: True
edit_lock_token: None
inc_updates: False
lock_state: EDIT_UNLOCK
locked_at: None
locked_by: None
mtime: Thu Aug 23 10:51:14 2012
nsec3: True
reference: nutty-nutty@ANATHOTH-NET
sg_name: anathoth-internal
soa_serial: 2012091400
state: RESET
use_apex_ns: True
zi_candidate_id: 102880
zi_id: 102880
zone_id: 101448
zone_type: DynDNSZoneSM
zi_id: 102880
change_by: grantma@shalom-ext.internal.anathoth.net/Admin
ctime: Fri Sep 14 10:55:59 2012
mtime: Fri Sep 14 11:12:10 2012
ptime: Fri Sep 14 11:12:10 2012
soa_expire: 7d
soa_minimum: 600
soa_mname: ns1.internal.anathoth.net.
soa_refresh: 24h
soa_retry: 900
soa_rname: matthewgrant5.gmail.com.
soa_serial: 2012091400
soa_ttl: None
zone_id: 101448
zone_ttl: 24h
zone_tool > ls_pending_events
ServerSMCheckServer shalom Fri Nov 9 08:50:35
2012
ServerSMCheckServer shalom-ext Fri Nov 9 08:50:40
2012
ServerSMCheckServer shalom-dr Fri Nov 9 08:50:46
2012
ServerSMCheckServer dns-slave1 Fri Nov 9 08:50:53
2012
ServerSMConfigure en-gedi-auth Fri Nov 9 08:55:31
2012
ZoneSMConfig wham-blam.org. Fri Nov 9 08:47:07
2012
MasterSMHoldTimeout Fri Nov 9 08:56:52
2012
ServerSMCheckServer dns-slave0 Fri Nov 9 08:54:29
2012
zone_tool > show_zonesm wham-blam.org
name: wham-blam.org.
alt_sg_name: None
auto_dnssec: False
ctime: Thu Aug 23 10:51:14 2012
deleted_start: None
edit_lock: True
edit_lock_token: None
inc_updates: False
lock_state: EDIT_UNLOCK
locked_at: None
locked_by: None
mtime: Thu Aug 23 10:51:14 2012
nsec3: True
reference: nutty-nutty@ANATHOTH-NET
sg_name: anathoth-internal
soa_serial: 2012091400
state: UNCONFIG
use_apex_ns: True
zi_candidate_id: 102880
zi_id: 102880
zone_id: 101448
zone_type: DynDNSZoneSM
zi_id: 102880
change_by: grantma@shalom-ext.internal.anathoth.net/Admin
ctime: Fri Sep 14 10:55:59 2012
mtime: Fri Sep 14 11:12:10 2012
ptime: Fri Sep 14 11:12:10 2012
soa_expire: 7d
soa_minimum: 600
soa_mname: ns1.internal.anathoth.net.
soa_refresh: 24h
soa_retry: 900
soa_rname: matthewgrant5.gmail.com.
soa_serial: 2012091400
soa_ttl: None
zone_id: 101448
zone_ttl: 24h
zone_tool > ls_pending_events
ServerSMCheckServer shalom Fri Nov 9 08:50:35
2012
ServerSMCheckServer shalom-ext Fri Nov 9 08:50:40
2012
ServerSMCheckServer shalom-dr Fri Nov 9 08:50:46
2012
ServerSMCheckServer dns-slave1 Fri Nov 9 08:50:53
2012
ServerSMConfigure en-gedi-auth Fri Nov 9 08:55:31
2012
MasterSMHoldTimeout Fri Nov 9 08:56:52
2012
ServerSMCheckServer dns-slave0 Fri Nov 9 08:54:29
2012
ZoneSMReconfigUpdate wham-blam.org. Fri Nov 9 08:57:10
2012
zone_tool > ls_pending_events
ServerSMCheckServer shalom-ext Fri Nov 9 09:00:25
2012
ServerSMCheckServer shalom-dr Fri Nov 9 09:00:44
2012
ServerSMCheckServer dns-slave0 Fri Nov 9 09:01:25
2012
ServerSMCheckServer dns-slave1 Fri Nov 9 09:02:11
2012
ServerSMConfigure en-gedi-auth Fri Nov 9 09:06:15
2012
MasterSMHoldTimeout Fri Nov 9 09:06:57
2012
ServerSMCheckServer shalom Fri Nov 9 09:05:11
2012
zone_tool > show_zonesm wham-blam.org
name: wham-blam.org.
alt_sg_name: None
auto_dnssec: False
ctime: Thu Aug 23 10:51:14 2012
deleted_start: None
edit_lock: True
edit_lock_token: None
inc_updates: False
lock_state: EDIT_UNLOCK
locked_at: None
locked_by: None
mtime: Thu Aug 23 10:51:14 2012
nsec3: True
reference: nutty-nutty@ANATHOTH-NET
sg_name: anathoth-internal
soa_serial: 2012091400
state: PUBLISHED
use_apex_ns: True
zi_candidate_id: 102880
zi_id: 102880
zone_id: 101448
zone_type: DynDNSZoneSM
zi_id: 102880
change_by: grantma@shalom-ext.internal.anathoth.net/Admin
ctime: Fri Sep 14 10:55:59 2012
mtime: Fri Nov 9 08:57:13 2012
ptime: Fri Nov 9 08:57:13 2012
soa_expire: 7d
soa_minimum: 600
soa_mname: ns1.internal.anathoth.net.
soa_refresh: 24h
soa_retry: 900
soa_rname: matthewgrant5.gmail.com.
soa_serial: 2012091400
soa_ttl: None
zone_id: 101448
zone_ttl: 24h
zone_tool >
Can be caused by:
Failed MasterSMHoldTimeout events (manually failed or otherwise, Events queue deleted in DB etc)
Permissions problems on the master server on the /etc/bind/master-config directory - Should be 2755 dmsdmd:bind:
shalom-ext: -grantma- [~] $ ls -ld /etc/bind/master-config drwxr-sr-x 2 net24dmd bind 4096 Nov 9 08:56 /etc/bind/master-config
This shows up in zone_tool show_dms_status:
zone_tool > show_dms_status
show_master_status:
MASTER_SERVER: dms-akl
NAMED master configuration state:
hold_sg: HOLD_SG_NONE
hold_sg_name: None
hold_start: Wed Nov 7 16:52:36 2012
hold_stop: Wed Nov 7 17:02:36 2012
replica_sg_name: vygr-replica
state: HOLD
show_replica_sg:
sg_name: vygr-replica
config_dir: /etc/net24/server-config-templates
master_address: 2406:1e00:1001:1::2
master_alt_address: 2406:3e00:1001:1::2
replica_sg: True
zone_count: 37
Replica SG named status:
dms-chc 2406:3e00:1001:1::2
OK
ls_server:
dms-akl Wed Nov 7 16:52:46 2012 OK
2406:1e00:1001:1::2 None
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dms-chc Wed Nov 7 16:52:46 2012 OK
2406:3e00:1001:1::2 210.5.48.242
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dms-s1-akl Wed Nov 7 16:31:04 2012 RETRY
2406:1e00:1001:2::2 103.4.136.226
ping: 5 packets transmitted, 5 received, 0.00% packet loss
retry_msg:
Server 'dms-s1-akl': SOA query - timeout waiting for
response, retrying
dms-s1-chc Wed Nov 7 16:52:46 2012 OK
2406:3e00:1001:2::2 210.5.48.226
ping: 5 packets transmitted, 5 received, 0.00% packet loss
list_pending_events:
ServerSMConfigure dms-s1-akl Wed Nov 7 16:57:22
2012
ServerSMCheckServer dms-chc Wed Nov 7 16:53:55
2012
ServerSMCheckServer dms-akl Wed Nov 7 16:55:46
2012
ServerSMCheckServer dms-s1-chc Wed Nov 7 16:57:06
2012
zone_tool > exit
dms-akl: -root- [~]
# date
Wed Nov 7 16:54:42 NZDT 2012
Key things to look for:
- master status section shows hold_start and hold_stop being in the past
- there is no MasterSMHoldTimeout event
Note
The MasterSM state machine forward posts the MasterSMHoldTimeout event when entering the HOLD state. If it does not get created or disappears or fails due to unforeseen events with outages etc, the MasterSM will end up stuck as above.
The fix is to do zone_tool reset_master. This will reset the MasterSM state machine.
Just like the Master state machine getting stuck because of a missing MasterSMHoldTimeout event, Server SMs can end up being stuck in the CONFIG, RETRY or BROKEN states due to missing events. There will be missing ServerSMConfigure events for the server in the ls_pending_events output:
zone_tool > show_dms_status
show_master_status:
MASTER_SERVER: shalom-ext
NAMED master configuration state:
hold_sg: HOLD_SG_NONE
hold_sg_name: None
hold_start: None
hold_stop: None
replica_sg_name: anathoth-replica
state: READY
show_replica_sg:
sg_name: anathoth-replica
config_dir: /etc/bind/anathoth-master
master_address: 2001:470:f012:2::2
master_alt_address: 2001:470:f012:2::3
replica_sg: True
zone_count: 14
Replica SG named status:
shalom-dr 2001:470:f012:2::3
OK
ls_server:
dns-slave0 Fri Nov 9 09:56:48 2012 OK
2001:470:c:110e::2 111.65.238.10
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dns-slave1 Fri Nov 9 09:56:38 2012 OK
2001:470:66:23::2 111.65.238.11
ping: 5 packets transmitted, 5 received, 0.00% packet loss
en-gedi-auth Thu Nov 8 18:01:07 2012 RETRY
fd14:828:ba69:6:5054:ff:fe39:54f9 172.31.12.2
ping: 5 packets transmitted, 0 received, 100.00% packet loss
retry_msg:
Server 'en-gedi-auth': failed to rsync include files,
Command '['rsync', '--quiet', '-av', '--password-file',
'/etc/net24/rsync-dnsconf-password', '/var/lib/net24/dms-sg
/anathoth-internal/',
'dnsconf@[fd14:828:ba69:6:5054:ff:fe39:54f9]::dnsconf/']'
returned non-zero exit status 10, rsync: failed to connect
to fd14:828:ba69:6:5054:ff:fe39:54f9
(fd14:828:ba69:6:5054:ff:fe39:54f9): Connection timed out
(110), rsync error: error in socket IO (code 10) at
clientserver.c(122) [sender=3.0.9]
shalom Fri Nov 9 09:56:19 2012 OK
fd14:828:ba69:1:21c:f0ff:fefa:f3c0 192.168.110.1
ping: 5 packets transmitted, 5 received, 0.00% packet loss
shalom-dr Fri Nov 9 09:56:56 2012 OK
2001:470:f012:2::3 172.31.10.4
ping: 5 packets transmitted, 5 received, 0.00% packet loss
shalom-ext Fri Nov 9 09:58:21 2012 OK
2001:470:f012:2::2 172.31.10.2
ping: 5 packets transmitted, 5 received, 0.00% packet loss
list_pending_events:
ServerSMCheckServer shalom Fri Nov 9 10:01:43 2012
ServerSMCheckServer dns-slave1 Fri Nov 9 10:01:55 2012
ServerSMCheckServer dns-slave0 Fri Nov 9 10:03:17 2012
ServerSMCheckServer shalom-dr Fri Nov 9 10:05:25 2012
ServerSMCheckServer shalom-ext Fri Nov 9 10:04:49 2012
zone_tool >
Note
Above, the ls_server section of show_dms_status displays the reason for going to RETRY or BROKEN in the displayed retry_msg field.
The fix, reset_server the server, and use ls_pending_events to check an ServerSMConfigure event is created:
zone_tool > reset_server en-gedi-auth
zone_tool > ls_pending_events
ServerSMCheckServer shalom Fri Nov 9 12:11:17 2012
ServerSMCheckServer shalom-ext Fri Nov 9 12:11:47 2012
ServerSMCheckServer en-gedi-auth Fri Nov 9 12:14:57 2012
ServerSMCheckServer dns-slave0 Fri Nov 9 12:18:02 2012
ServerSMCheckServer shalom-dr Fri Nov 9 12:15:09 2012
ServerSMCheckServer dns-slave1 Fri Nov 9 12:19:08 2012
ServerSMConfigure en-gedi-auth Fri Nov 9 12:10:39 2012
zone_tool >
Wait until the scheduled time posted for ServerSMConfigure, and then do a zone_tool show_dms_status to make sure everything is going:
zone_tool > show_dms_status
show_master_status:
MASTER_SERVER: shalom-ext
NAMED master configuration state:
hold_sg: HOLD_SG_NONE
hold_sg_name: None
hold_start: None
hold_stop: None
replica_sg_name: anathoth-replica
state: READY
show_replica_sg:
sg_name: anathoth-replica
config_dir: /etc/bind/anathoth-master
master_address: 2001:470:f012:2::2
master_alt_address: 2001:470:f012:2::3
replica_sg: True
zone_count: 14
Replica SG named status:
shalom-dr 2001:470:f012:2::3
OK
ls_server:
dns-slave0 Fri Nov 9 12:08:29 2012 OK
2001:470:c:110e::2 111.65.238.10
ping: 5 packets transmitted, 5 received, 0.00% packet loss
dns-slave1 Fri Nov 9 12:10:19 2012 OK
2001:470:66:23::2 111.65.238.11
ping: 5 packets transmitted, 5 received, 0.00% packet loss
en-gedi-auth Fri Nov 9 12:10:43 2012 OK
fd14:828:ba69:6:5054:ff:fe39:54f9 172.31.12.2
ping: 5 packets transmitted, 5 received, 0.00% packet loss
shalom Fri Nov 9 12:11:19 2012 OK
fd14:828:ba69:1:21c:f0ff:fefa:f3c0 192.168.110.1
ping: 5 packets transmitted, 5 received, 0.00% packet loss
shalom-dr Fri Nov 9 12:09:44 2012 OK
2001:470:f012:2::3 172.31.10.4
ping: 5 packets transmitted, 5 received, 0.00% packet loss
shalom-ext Fri Nov 9 12:11:47 2012 OK
2001:470:f012:2::2 172.31.10.2
ping: 5 packets transmitted, 5 received, 0.00% packet loss
list_pending_events:
ServerSMCheckServer en-gedi-auth Fri Nov 9 12:14:57 2012
ServerSMCheckServer dns-slave0 Fri Nov 9 12:18:02 2012
ServerSMCheckServer shalom-dr Fri Nov 9 12:15:09 2012
ServerSMCheckServer dns-slave1 Fri Nov 9 12:19:08 2012
ServerSMCheckServer shalom Fri Nov 9 12:17:44 2012
ServerSMCheckServer shalom-ext Fri Nov 9 12:17:31 2012
zone_tool >
The named dynamic data in /var/lib/bind/dynamic is corrupt, or missing
Stop named and dmsdmd:
root@dms3-master:~# service bind9 stop [....] Stopping domain name service...: bind9waiting for pid 15462 to die . ok root@dms3-master:~# service net24dmd stop [ ok ] Stopping net24dmd: net24dmd.
- Check /var/lib/dms/master_config and /var/lib/bind/dynamic permissions.
/var/lib/dms/master-config, should be 2755 dmsdmd:bind:
root@dms3-master:~# ls -ld /var/lib/dms/master-config/ drwxr-sr-x 2 dmsdmd bind 4096 Nov 9 12:39 /var/lib/dms/master-config/ root@dms3-master:~# :file:`/var/lib/bind/dynamic`, should be ``2775 bind:dmsdmd``:: root@dms3-master:~# ls -ld /var/lib/bind/dynamic drwxrwsr-x 2 bind dmsdmd 1683456 Nov 9 12:39 /var/lib/bind/dynamic root@dms3-master:~#Clear any files from /var/lib/bind/dynamic if needed:
root@dms3-master:~# rm -rf /var/lib/bind/dynamic/* root@dms3-master:~#Run the restore process which recreates /etc/bind/master-config/ contents, and recreates contents of /var/lib/bind/dynamic. This may take some time. 40000 zones takes 20 - 30 minutes.
root@dms3-master:~# zone_tool restore_named_db *** WARNING - doing this destroys DNSSEC RRSIG data. It is a last resort in DR recovery. *** Do really you wish to do this? --y/[N]> yStart named and dmsdmd:
root@dms3-master:~# service dmsdmd start [ ok ] Starting dmsdmd: dmsdmd. root@dms3-master:~# service bind9 start [ ok ] Starting domain name service...: bind9. root@dms3-master:~#
The master and DR replica have the etckeeper git archive mirrored every 4 hours to the alternate server. See etckeeper and /etc on Replica and Master Servers
/etc/cron.d/dms-core does daily FULL pg_dumpall to /var/backups/postresql-9.1-dms.sql.gz, on replica and master, which are rotated for 7 days.
To recover:
# cd /var/backups
# gunzip -c postregresql-9.1-dms.sql.gz | psql -U pgsql
There will be lots of SQL output. The dump also contains DB user passwords, and ACL/permissions information, along with DB details for the whole PostgresQL ‘dms’ cluster.
Use the dns-recreateds command to recreate a domains DNSSEC DS material. The /var/lib/bind/keys directory is rsynced to the DR replica by the master server dmsdmd daemon. Use a ‘*’ argument to regenerate all DS material.
shalom-ext: -root- [/var/lib/bind/keys]
# dns-recreateds anathoth.net
+ dnssec-dsfromkey -2 /var/lib/bind/keys/Kanathoth.net.+007+57318.key
+ set +x
shalom-ext: -root- [/var/lib/bind/keys]
#
These examples are between DNS slave server dns-slave1 and master shalom-ext, using racoon, via racoon-tool in Debian Wheezy.
Note
The ICMPv6 setup is specific to this Debian Wheezy racoon setup. However, the test techniques are also applicable to usewith Strongswan and other IPSEC software.
Ping6 server from master and vice-versa to check unencrypted network level. (Transport mode encryption does not encrypt ICMPv6). Use the zone_tool ls_server -v command to get the DMS configured IPv6 addresses of both servers.
shalom-ext: -grantma- [~/dms]
$ zone_tool ls_server -v dns-slave1
dns-slave1 Mon Nov 12 13:57:20 2012 OK
2001:470:66:23::2 111.65.238.11
shalom-ext: -grantma- [~/dms]
$ zone_tool ls_server -v shalom-ext
shalom-ext Mon Nov 12 13:59:29 2012 OK
2001:470:f012:2::2 172.31.10.2
shalom-ext: -grantma- [~/dms]
$ ping6 2001:470:66:23::2
PING 2001:470:66:23::2(2001:470:66:23::2) 56 data bytes
64 bytes from 2001:470:66:23::2: icmp_seq=1 ttl=58 time=312 ms
64 bytes from 2001:470:66:23::2: icmp_seq=2 ttl=58 time=310 ms
64 bytes from 2001:470:66:23::2: icmp_seq=3 ttl=58 time=310 ms
^C
--- 2001:470:66:23::2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 310.646/311.293/312.518/0.866 ms
shalom-ext: -grantma- [~/dms]
$
Telnet domain TCP ports both ways, and rsync out to slave server from master. This checks that IPSEC encryption is running.
From shalom-ext:
shalom-ext: -grantma- [~/dms]
$ telnet 2001:470:66:23::2 53
Trying 2001:470:66:23::2...
Connected to 2001:470:66:23::2.
Escape character is '^]'.
^]c
telnet> c
Connection closed.
shalom-ext: -grantma- [~/dms]
$ telnet 2001:470:66:23::2 rsync
Trying 2001:470:66:23::2...
Connected to 2001:470:66:23::2.
Escape character is '^]'.
@RSYNCD: 30.0
^]c
telnet> c
Connection closed.
shalom-ext: -grantma- [~/dms]
$
From dns-slave1:
grantma@dns-slave1:~$ telnet 2001:470:f012:2::2 53
Trying 2001:470:f012:2::2...
Connected to 2001:470:f012:2::2.
Escape character is '^]'.
^]c
telnet> c
Connection closed.
grantma@dns-slave1:~$
If the DNS server is a DR replica, telnet the rsync port the other way also.
For racoon and strongswan, if things are not working restart the IPSEC connection at both ends:
Note
For Strongswan, use the ipsec up/down <connection-name>. ipsec status [<connection-name>] can be used to list all connections, and query about status.
racoon shalom-ext master:
shalom-ext: -root- [/home/grantma/dms]
# racoon-tool vlist
shalom-dr
dns-slave1
%anonymous
shalom-ext
shalom
dns-slave0
en-gedi-auth
shalom-ext: -root- [/home/grantma/dms]
# racoon-tool vreload dns-slave1
Reloading VPN dns-slave1...The result of line 2: No entry.
The result of line 5: No entry.
done.
shalom-ext: -root- [/home/grantma/dms]
#
racoon dns-slave1:
root@dns-slave1:/home/grantma# racoon-tool vlist
shalom-dr
%anonymous
shalom-ext
root@dns-slave1:/home/grantma# racoon-tool vreload shalom-ext
Reloading VPN shalom-ext...The result of line 2: No entry.
The result of line 5: No entry.
done.
root@dns-slave1:/home/grantma#
Note
Wait 10 minutes for IPSEC replay timing to expire. Then retry the telnet steps above.
If IPSEC still will not work:
For racoon, Use racoon-tool restart on both ends. For strongswan, use ipsec restart on both ends.
shalom-ext:
shalom-ext: -root- [/home/grantma/dms]
# racoon-tool restart
Stopping IKE (ISAKMP/Oakley) server: racoon.
Flushing SAD and SPD...
SAD and SPD flushed.
Unloading IPSEC/crypto modules...
IPSEC/crypto modules unloaded.
Loading IPSEC/crypto modules...
IPSEC/crypto modules loaded.
Flushing SAD and SPD...
SAD and SPD flushed.
Loading SAD and SPD...
SAD and SPD loaded.
Configuring racoon...done.
Starting IKE (ISAKMP/Oakley) server: racoon.
shalom-ext: -root- [/home/grantma/dms]
#
dns-slave1:
root@dns-slave1:/home/grantma# racoon-tool restart
Stopping IKE (ISAKMP/Oakley) server: racoon.
Flushing SAD and SPD...
SAD and SPD flushed.
Unloading IPSEC/crypto modules...
IPSEC/crypto modules unloaded.
Loading IPSEC/crypto modules...
IPSEC/crypto modules loaded.
Flushing SAD and SPD...
SAD and SPD flushed.
Loading SAD and SPD...
SAD and SPD loaded.
Configuring racoon...done.
Starting IKE (ISAKMP/Oakley) server: racoon.
root@dns-slave1:/home/grantma#
Note
Wait 10 minutes for IPSEC replay timing to expire. Then retry the telnet steps above.
Base Operating System: Debian Wheezy or later.
Create /etc/apt/apt.conf.d/00local.conf:
// No point in installing a lot of fat on VM servers
APT::Install-Recommends "0";
APT::Install-Suggests "0";
Create /etc/apt/sources.list.d/00local.conf:
deb http://deb-repo.devel.net.nz/debian/ wheezy main
deb-src http://deb-repo.devel.net.nz/debian/ wheezy main
Install these packages:
# apt-get install cron-apt screen tree procps psmisc sysstat sudo lsof open-vm-tools open-vm-dkms dms
If using netscript-2.4 instead of ifupdown to properly install it because of cyclic boot dependencies (I will look into this when have some spare time, and log an RC Debian bug):
# dpkg --force --purge ifupdown
# apt-get -f install
Further, for netscript-2.4, edit /etc/netscript/network.conf to configure static addressing. Look for IF_AUTO, set eth0_IPADDR, and further down comment out eth_start and eth_stop functions to turn off DHCP.
Note
For most setups, netscript-ipfilter is a suitable package for managing Linux filtering rules without replacing ifupdown.
Netscript-2.4 and netscript-ipfilter manage iptables and ip6tables via iptables-save/iptables-restore, and keeps a cyclic history which you can change back to if your filter changes go wrong via netscript ipfilter/ip6filter save/usebackup.
Then:
# aptitude update
# aptitude upgrade
Note
This is just a personal Debian prompt thing of mine. You might say I get too personal...
To fix shell prompt for larger terminals on master server makes typing in long zone_tool commands at shell a lot clearer:
# tar -C / -xzf shell.tar.gz
Replaces /etc/skel shell and /root dot files with single line feed to force use of file in /etc
Then edit /etc/environment.sh to turn off various things like umask 00002 for user id less than 1000.
Then follow Debian Install documentation.