- Fixing a broken postgresql
-
- Feb 02 17:53:34 <ppc123> hello... been a while since I've IRC'd
- Feb 02 17:54:45 <ppc123> I have a PGDATA that was 'saved' from a crashing/hacked server. created a new cluster, moved the PGDATA and replaced with the one that was 'saved'
- Feb 02 17:55:17 <ppc123> getting... FATAL "global/9191360"... file not found when trying to connect
- Feb 02 17:55:34 <RhodiumToad> ppc123: are you sure you got the _whole_ pgdata?
- Feb 02 17:55:59 <RhodiumToad> ppc123: and that you removed all existing files before putting the new ones in place?
- Feb 02 17:56:13 <ppc123> it is indeed not in the global folder... there is a 9191360_fsm and a 9191360_vm, but no 9191360
- Feb 02 17:56:38 <ppc123> no, I am not sure I got the whole pgdata cause it wasn't me doin the gettin' :D
- Feb 02 17:57:09 <ppc123> this is pg9.4
- Feb 02 17:57:27 <ario_> RhodiumToad: :(
- Feb 02 17:57:38 <RhodiumToad> ppc123: do you have a listing or copy of the saved files?
- Feb 02 17:57:50 <ppc123> I do have access to that
- Feb 02 17:58:13 <RhodiumToad> can you do a directory listing of what you have and put it on a paste site?
- Feb 02 17:59:00 <ppc123> RhodiumToad - like a 'tree' listing?
- Feb 02 17:59:11 <ppc123> or a specific dir list?
- Feb 02 17:59:36 <RhodiumToad> ppc123: tree listing would be good.
- Feb 02 18:00:21 <ppc123> Ok - It is the whole 9.4/, including data - that all does look 'complete'
- Feb 02 18:00:51 <ppc123> One other thing I looked at was the pg_resetxlog -n output
- Feb 02 18:02:02 <RhodiumToad> ... do not touch the pg_destroy_my_data command
- Feb 02 18:02:17 <ppc123> https://pastebin.com/fzYVWh8h
- Feb 02 18:02:41 <ppc123> and I will not touch that again :D
- Feb 02 18:02:52 <RhodiumToad> pg_controldata is the command to display that info, btw
- Feb 02 18:03:40 <ppc123> ok, I will have to reconnect with this tomorrow.
- Feb 02 18:03:47 <ppc123> shoulda come here sooner :D
-
-
- Feb 03 17:14:12 <ppc123_> am I still here?
- Feb 03 17:14:33 <ppc123_> RhodiumToad - Here is that broken PGDATA Dir listing
- Feb 03 17:14:37 <ppc123_> https://controlc.com/eda93dcf
- Feb 03 17:15:45 <ppc123_> <ppc123> getting... FATAL "global/9191360"... file not found when trying to connect
- Feb 03 17:22:38 <Myon> that's weirdly large OID numbers in global/
- Feb 03 17:24:16 <Myon> ppc123_: you could try `postgres -P` and hope the missing file is an index
-
- Feb 08 10:07:25 <ppc123> RhodiumToad - Here is that broken PGDATA Dir listing https://controlc.com/eda93dcf getting... FATAL "global/9191360"... file not found when trying to connect
- Feb 08 10:07:42 <ppc123> It was suggested to try postgres -P and hope it's just an index
-
- Feb 08 16:16:14 <ppc123> PG Startup errors - could not locate a valid checkpoint record
- Feb 08 16:16:18 <ppc123> https://controlc.com/71c18170
- Feb 08 16:17:12 <ppc123> Related to this: Here is that broken PGDATA Dir listing https://controlc.com/eda93dcf getting... FATAL "global/9191360"... file not found when trying to connect
- Feb 08 16:17:21 <peerce> what happened to that server prior to this ?
- Feb 08 16:17:50 <ppc123> It was died... User backed up PGDATA with copy, on windows to an external drive
- Feb 08 16:18:28 <peerce> was that copy made with the database server stopped ?
- Feb 08 16:18:37 <ppc123> not sure if disk corruption or pg was still running
- Feb 08 16:18:46 <peerce> if not, then its not coherent, unless pg_start_backup() and pg_stop_backup() was called around the backup
- Feb 08 16:19:05 <ppc123> yeah, that definitely did not happen
- Feb 08 16:20:47 <ppc123> and... their backups were encrypted by bitcoin hackers
- Feb 08 16:25:38 <peerce> oh gawd.
- Feb 08 16:26:13 <RhodiumToad> yeesh
- Feb 08 16:28:07 <RhodiumToad> ppc123: so the thing that strikes me about that directory listing is the pg_clog
- Feb 08 16:29:12 <RhodiumToad> having just one file there is odd, it suggests that this db has only ever had about 550k write transactions?
- Feb 08 16:29:46 <ppc123> i dunno. it is rather large for the application
- Feb 08 16:29:46 <RhodiumToad> is that at all plausible in your scenario?
- Feb 08 16:29:50 <ppc123> so that doen't make sense
- Feb 08 16:29:55 <RhodiumToad> what is large?
- Feb 08 16:30:09 <ppc123> the size of the dir - it's about 70GB all in
- Feb 08 16:30:23 <ppc123> which should be more than 550k writes
- Feb 08 16:30:42 <RhodiumToad> about 69GB of that is xlog
- Feb 08 16:31:09 <ppc123> puzzled... they're not using replication.
- Feb 08 16:31:15 <RhodiumToad> for whatever reason you seem to have xlog going back years
- Feb 08 16:31:22 <peerce> did they set an archive_command and never test it ?
- Feb 08 16:31:32 <RhodiumToad> do you know when the copy was taken?
- Feb 08 16:31:34 <peerce> a broken archive_command will cause xlog to grow forever
- Feb 08 16:31:37 <ppc123> 10/1/2020
- Feb 08 16:31:43 <RhodiumToad> large value of wal_keep_segments is another possibility
- Feb 08 16:31:56 <ppc123> I can look at their postgresql.conf
- Feb 08 16:32:29 <RhodiumToad> archive_status is empty, so archive_command wouldn't be the issue
- Feb 08 16:33:08 <RhodiumToad> can you run pg_controldata on the copy?
- Feb 08 16:33:56 <ilmari_> ppc123: is that the first of october or the tenth of january?
- Feb 08 16:33:59 <ppc123> yes 1 min - with any flags?
- Feb 08 16:34:03 <ppc123> oct 1st
- Feb 08 16:34:35 <ppc123> RhodiumToad - pg_controldata with any flags?
- Feb 08 16:34:56 <RhodiumToad> just the path to the data dir
- Feb 08 16:37:19 <ppc123> https://controlc.com/00e1845c
- Feb 08 16:37:55 <RhodiumToad> ugh
- Feb 08 16:38:08 <RhodiumToad> so, this copy was almost certainly taken while the db was still running
- Feb 08 16:38:41 <RhodiumToad> notice the control file says "in production"
- Feb 08 16:38:52 <ppc123> yep. that's handy.
- Feb 08 16:39:19 <ppc123> :/ since I learned so much today, where do I send .5 BTC?
- Feb 08 16:39:21 <RhodiumToad> and it gives a checkpoint location, but startup shows that location was in a recycled-and-not-yet-used segment
- Feb 08 16:40:10 <RhodiumToad> ..._how_ much?
- Feb 08 16:40:52 <ppc123> I think industry standard is .5 BTC.
- Feb 08 16:41:09 <peerce> isn't that like US$500 now or something?
- Feb 08 16:41:33 <ilmari> peerce: more like 22k
- Feb 08 16:41:44 <ppc123> RhodiumToad - you're saying it's - fubar'd ?
- Feb 08 16:42:46 <RhodiumToad> recovery prospects aren't good. maybe one of the consulting firms would take it on, but I'd hesitate a bit to try it over irc, even though I've helped people that way before
- Feb 08 16:43:23 <RhodiumToad> you've tried a resetxlog? (on a copy of the copy, of course - never do that on your only backup)
- Feb 08 16:43:44 <peerce> ppc123; fundamental problem is, taking a backup of lots of files took some finite time, and those files were being randomly modified during this time, so the backup isn't a coherent snapshot.
-
- Feb 08 16:44:46 <ppc123> I did try resetxlog
- Feb 08 16:44:59 <ppc123> the - pg_destroy_my_data_please command :D
- Feb 08 16:45:21 <peerce> the correct way is to use pg_basebackup (built in), or pgBackRest (3rd party, free), or to bracket your file system backup with pg_start/stop_backup() API calls.
- Feb 08 16:45:24 <ppc123> yeah - I would never recommend they do that - and didn't...
- Feb 08 16:45:48 <ppc123> this was done on assumptions that... it could be done.
- Feb 08 16:46:01 <RhodiumToad> ppc123: did it start after a resetxlog, and what did you try?
-
- Feb 08 16:46:40 <ppc123> let me do that 1 more time - I believe pg started but I got the global missing /somefile
- Feb 08 16:46:47 <ppc123> was last week...
- Feb 08 16:46:59 <RhodiumToad> yeah, I remember asking for the info then
- Feb 08 16:47:14 <RhodiumToad> but our paths failed to cross since
- Feb 08 16:47:16 <ppc123> ya - that is the other pastebin
- Feb 08 16:47:31 <ppc123> oh
- Feb 08 16:47:56 <ppc123> so - anything special - like XLOGFILE or OID/OFFSET to pass to resetxlog?
-
- Feb 08 16:49:21 <RhodiumToad> iirc the manpage has some suggestions for that. specify an xlogfile larger than the largest you have, and an xid of 1048576 should do
- Feb 08 16:49:32 <RhodiumToad> oid shouldn't matter much
-
-
- Feb 08 16:50:50 <RhodiumToad> https://www.postgresql.org/docs/9.4/app-pgresetxlog.html -- the "safe values can be determined" part
- Feb 08 16:50:59 <RhodiumToad> (this is for fairly small values of "safe")
-
- Feb 08 16:52:50 <ppc123> my largest pg_clog file is... 0000 :D
- Feb 08 16:53:09 <RhodiumToad> that's why I said 1048576
- Feb 08 16:53:32 <ppc123> ohh
- Feb 08 16:53:54 <ppc123> missed that... I was gonna do the math...
-
- Feb 08 16:55:34 <ppc123> did: pg_resetxlog.exe -f -x 1048576 ../data
- Feb 08 16:55:42 <ppc123> now try starting?
- Feb 08 16:56:48 <ppc123> pg_ctl or other?
-
- Feb 08 16:59:58 <ppc123> bah. global/9191360 no such file or dir
-
- Feb 08 17:01:15 <ppc123> i.e. psql -U postgres -h 127.0.0.1 -p 5433 postgres -> FATAL could not open file "global/9191360": no such thing
- Feb 08 17:04:53 <ppc123> welp, it was a good try.
- Feb 08 17:05:43 <RhodiumToad> ppc123: wait, there's more to try
- Feb 08 17:05:58 <ppc123> but wait, there's more?!
- Feb 08 17:06:01 <RhodiumToad> the next question is whether you can get in with system indexes disabled
- Feb 08 17:06:06 <ppc123> 1 min
-
- Feb 08 17:06:31 <RhodiumToad> easiest way to try that is to stop the server, and try going in with postgres --single -P
- Feb 08 17:07:45 <RhodiumToad> if you can get in with that, reindex system on any database may help
- Feb 08 17:08:21 <RhodiumToad> it's also possible to try that with the server running, by setting the environment var PGOPTIONS="-P" before running psql
-
-
- Feb 08 17:15:18 <ppc123> hrm... nope. still FATAL could not open global/9191360
- Feb 08 17:16:17 <RhodiumToad> which method did you use?
- Feb 08 17:16:22 <ppc123> tried... with PGOPTIONS="-P",
- Feb 08 17:16:34 <ppc123> then psql -U postgres -h 127.0.0.1 -p 5433 postgres
- Feb 08 17:16:54 <RhodiumToad> ppc123: can you try the --single method too
-
- Feb 08 17:18:07 <RhodiumToad> ppc123: this kind of error suggests that someone was doing either a vacuum full or a reindex of the global tables at the time the data was being copied
- Feb 08 17:18:31 <RhodiumToad> ppc123: does that seem likely?
- Feb 08 17:19:10 <ppc123> Anything is likely with this particular user.
- Feb 08 17:19:29 <ppc123> So, --single hits same global/blah not exists
- Feb 08 17:20:28 <ppc123> I have one more paste - to show that --single output
- Feb 08 17:21:08 <ppc123> https://controlc.com/bcce480d
- Feb 08 17:23:11 <ppc123> is global/ relating to users? or databases?
-
- Feb 08 17:23:38 <RhodiumToad> global/ contains the tables for pg_database, pg_shdepend, pg_authid, pg_auth_members, pg_tablespace
- Feb 08 17:23:55 <RhodiumToad> (might be a couple more I forget but those are the main ones)
- Feb 08 17:24:10 <RhodiumToad> users/groups/roles, databases, tablespaces
- Feb 08 17:24:11 <ppc123> hrm... don't suppose one from a working cluster could be grafted in
- Feb 08 17:24:30 <RhodiumToad> ... maybe.
- Feb 08 17:24:37 <ppc123> we do have one of those.
- Feb 08 17:24:42 <RhodiumToad> question is, which is which
- Feb 08 17:24:58 <RhodiumToad> the high filenode shows that the file has been vacuum full'd
- Feb 08 17:25:10 <RhodiumToad> (or clustered - something that rewrites the file, anyway)
- Feb 08 17:25:29 <RhodiumToad> can you try that --single again but include -P this time
- Feb 08 17:25:46 <RhodiumToad> postgres --single -d5 -P -D ../data
- Feb 08 17:26:12 <ppc123> same - doesn't start due to global
- Feb 08 17:26:19 <ppc123> without single it does start
- Feb 08 17:26:38 <RhodiumToad> so the file must be one of the actual tables and not an index
- Feb 08 17:26:47 <ppc123> but can't connect with anything - it does recognize the user exists or not
- Feb 08 17:26:52 <RhodiumToad> global contains the indexes for all the tables I mentioned above
- Feb 08 17:27:26 <RhodiumToad> does or doesn't?
- Feb 08 17:28:10 <ppc123> does recognize users, but does not connect to db - template0, template1,postgres, etc
- Feb 08 17:28:31 <RhodiumToad> wait a sec. let me work this out
- Feb 08 17:28:39 <ppc123> it knows which roles exist
-
- Feb 08 17:29:23 <RhodiumToad> ok. there are supposed to be 7 tables in global/ and you have 6, and one obviously missing
- Feb 08 17:29:35 <RhodiumToad> so we can work out which is which by elimination I guess
- Feb 08 17:29:59 <RhodiumToad> can you do a hexdump of the global/pg_filenode.map file?
- Feb 08 17:31:00 <ppc123> sure, 1 min
-
- Feb 08 17:31:08 <Myon> I guess you could just "touch" the file and see if that helps
- Feb 08 17:31:17 <Myon> if not you can still delete it again...
-
- Feb 08 17:31:41 <RhodiumToad> that could work if it's pg_auth_members that's missing. for the others, not so much
- Feb 08 17:31:55 <RhodiumToad> maybe for pg_shdepend too
- Feb 08 17:31:56 <Myon> maybe it will at least say which one is missing
-
- Feb 08 17:32:28 <RhodiumToad> so yeah, try creating that as an empty (0 length) file and see what changes
- Feb 08 17:33:03 <RhodiumToad> (this is windows, I don't think there's a "touch", but you can copy nul or something, it's been a while)
-
- Feb 08 17:34:26 <ppc123> so... pg_filenode.map - I brought it on my linux - like... open it with a hex editor?
- Feb 08 17:35:11 <ppc123> or try creating an empty global/919... file?
-
- Feb 08 17:35:28 <Myon> (there should be plenty of empty other files to copy)
- Feb 08 17:35:39 <ppc123> there are
- Feb 08 17:35:55 <RhodiumToad> ppc123: on linux, hexdump -C filename should work
-
- Feb 08 17:37:33 <ppc123> filenode_map hex - https://controlc.com/79a4a43d
- Feb 08 17:38:11 <ppc123> and create an empty global/9191360 file ?
-
- Feb 08 17:40:31 <RhodiumToad> yes
- Feb 08 17:40:59 <RhodiumToad> uh.
- Feb 08 17:41:14 <RhodiumToad> so the file it's missing is pg_database, I'm not sure whether to believe that
-
- Feb 08 17:41:37 <ppc123> oh, well... progress - not getting the missing global
- Feb 08 17:41:50 <ppc123> but... databse "postgres" does not exist
- Feb 08 17:42:05 <ppc123> or template0, template1
-
- Feb 08 17:44:16 <RhodiumToad> that's consistent with it being pg_database that's missing :-)
- Feb 08 17:44:29 <RhodiumToad> that could, in theory, be reconstructed
- Feb 08 17:44:40 <ppc123> re-initted?
- Feb 08 17:44:56 <peerce> the question is, what OTHER databases were on this cluster that you need ?
- Feb 08 17:45:12 <ppc123> yeah - those would be the more important ones :D
- Feb 08 17:45:20 <RhodiumToad> the directory listing shows there were 5 dbs other than the 3 standard ones?
- Feb 08 17:46:02 <ppc123> sounds about correct.
- Feb 08 17:46:07 <RhodiumToad> fortunately their oids can be recovered from the directory names and most of the other stuff in pg_database probably doesn't matter
-
- Feb 08 17:47:06 <ppc123> yes - that is correct - about 4 or 5 - testing, dev, prod, etc
- Feb 08 17:47:33 <RhodiumToad> the names might or might not be recoverable from one of the indexes.
- Feb 08 17:47:46 <ppc123> names aren't important really
- Feb 08 17:47:54 <ppc123> we can figure out which is what from contents
- Feb 08 17:48:40 <ppc123> db1, db2, db3, db4 is fine :D
-
- Feb 08 17:49:28 <RhodiumToad> there'll be some work involved. how much time do you have available?
- Feb 08 17:49:59 <ppc123> Days?
-
- Feb 08 17:50:47 <ppc123> when I checked on this db today - they said "Oh, we had a database guy come look at this..." so, I had to start over today :D
- Feb 08 17:51:29 <RhodiumToad> ppc123: ok. I need to take a break for a few minutes, I'll be back
- Feb 08 17:51:34 <ppc123> NP
- Feb 08 17:51:55 <ppc123> do I need any files from this thing - I'm on a zoom
-
- Feb 08 17:58:25 <RhodiumToad> ppc123: the plan of attack I'm thinking of is: create a new cluster with initdb (in a different dir), find the pg_database file using select pg_relation_filepath('pg_database'::regclass);
- Feb 08 17:58:56 <RhodiumToad> copying that in might get you access to template1 and/or postgres when going in with --single -P
- Feb 08 17:59:02 <ppc123> ok
- Feb 08 17:59:12 <RhodiumToad> then reindex system postgres;
- Feb 08 18:00:10 <RhodiumToad> then create a CSV file containing entries for your actual dbs, and insert it using COPY pg_database FROM 'file' WITH (format CSV, OIDS);
- Feb 08 18:00:31 <RhodiumToad> the data that goes into that file will take a bit of working out.
- Feb 08 18:00:50 <RhodiumToad> first question is do you know what encoding and locale the old system was using?
-
-
- Feb 08 18:02:08 <RhodiumToad> the lines of the CSV file should be something like:
- Feb 08 18:03:28 <RhodiumToad> nnn,db_nnn,10,6,"C","C",f,t,-1,12487,648,1,1663,
-
- Feb 08 18:03:56 <RhodiumToad> where nnn is the directory name from base/ for each directory >= 16384
- Feb 08 18:04:21 <RhodiumToad> the 6 needs to be changed if the encoding is not UTF8, and the "C" needs to be changed to the locale name if it's not C
-
- Feb 08 18:04:32 <RhodiumToad> (the trailing , is required)
- Feb 08 18:06:42 <ppc123> encoding should be UTF8
- Feb 08 18:07:15 <ppc123> Ok, I got a global from a good cluster
- Feb 08 18:07:20 <ppc123> \q
- Feb 08 18:09:15 <ppc123> good global is 12131
- Feb 08 18:09:30 <ppc123> do I drop that in the broken dir as 9191360 ?
- Feb 08 18:09:53 <RhodiumToad> yes
-
- Feb 08 18:12:34 <ppc123> ok, I am connected to the broken cluster, and see the db names now
- Feb 08 18:12:49 <RhodiumToad> just the postgres, template0, template1 dbs?
- Feb 08 18:13:18 <ppc123> well, I took the naming from an existing running cluster - so there are the other databases.
- Feb 08 18:13:31 <RhodiumToad> er
- Feb 08 18:13:52 <RhodiumToad> can you connect to any of them? because it's not likely that the oids are the same between clusters
-
- Feb 08 18:14:35 <ppc123> they are not the same OID's - testdb owner unknown (OID=292121233_
- Feb 08 18:15:46 <ppc123> I did not do the CSV part
-
- Feb 08 18:16:21 <RhodiumToad> I mean, if you try and connect to any of those dbs it would likely complain that the directory is missing, no?
- Feb 08 18:16:51 <RhodiumToad> filling in entries for the db oids you actually have is what the csv part is for
- Feb 08 18:20:01 <ppc123> ok, let me follow your instructions
- Feb 08 18:20:03 <RhodiumToad> I need to go eat now, back in maybe half an hour
- Feb 08 18:20:15 <ppc123> thanks for getting me this far
-
- Feb 08 18:50:21 <RhodiumToad> back
- Feb 08 18:58:30 <ppc123> so, what I've come up with
- Feb 08 18:59:41 <ppc123> https://controlc.com/cc01e189
- Feb 08 19:00:31 <ppc123> getting a ERROR: extra data after last expected column...
- Feb 08 19:00:47 <RhodiumToad> I think you missed one
- Feb 08 19:00:57 <RhodiumToad> what's that ' ' at the end?
- Feb 08 19:01:14 <ppc123> datacl
-
- Feb 08 19:01:19 <RhodiumToad> after 1663 there should be one comma and a newline
- Feb 08 19:01:25 <RhodiumToad> datacl should be left null for now
- Feb 08 19:01:39 <RhodiumToad> also that ; at the end is wrong
- Feb 08 19:01:48 <RhodiumToad> and there should be no ' quotes anywhere
-
- Feb 08 19:02:08 <RhodiumToad> the db_nnn does not require quotes
- Feb 08 19:02:13 <ppc123> oh
- Feb 08 19:03:30 <ppc123> that copied.
- Feb 08 19:04:00 <RhodiumToad> now the question is whether you can connect to any of those dbs
- Feb 08 19:05:06 <ppc123> I'm gonna try this 12GB one...
- Feb 08 19:06:44 <ppc123> hrm... might have my encoding wrong - BUT... I see tables
- Feb 08 19:07:22 <ppc123> ph - well meaybe not
- Feb 08 19:07:27 <RhodiumToad> bear in mind that there may be inconsistencies in the data
- Feb 08 19:07:32 <RhodiumToad> maybe not what?
- Feb 08 19:07:43 <ppc123> nope.. I'm rushing ahead.
- Feb 08 19:07:54 <ppc123> should I reindex?
- Feb 08 19:08:18 <peerce> anything that was being upodated during the backup is going to have a mix of old and new tuples and out of sync indexes
- Feb 08 19:08:36 <peerce> FK relations might be out of sync, too
- Feb 08 19:12:27 <RhodiumToad> ppc123: what are you seeing?
- Feb 08 19:13:43 <RhodiumToad> generally speaking false positives are only likely to happen on the table level
- Feb 08 19:14:06 <RhodiumToad> some kinds of queries have to SI lock the whole table (e.g. because it was seq-scanned)
- Feb 08 19:14:43 <RhodiumToad> generally the escalation path is individual index rows -> index blocks (tens or hundreds of adjacent rows) -> whole table,
- Feb 08 19:14:46 <ppc123> running a reindex
- Feb 08 19:14:52 <RhodiumToad> and individual table blocks -> whole table
-
- Feb 08 19:15:06 <RhodiumToad> ppc123: what did you _see_
-
- Feb 08 19:15:36 <ppc123> ERROR about missing base/
-
- Feb 08 19:15:45 <ppc123> when select * from table
- Feb 08 19:15:48 <RhodiumToad> missing base/WHAT?
- Feb 08 19:16:09 <RhodiumToad> and are you still using -P when connecting? (you should)
- Feb 08 19:26:44 <ppc123> ok - so I am seeeing - ERROR: missing chunk number - for toast value 70677992 in pg_toast_2619
- Feb 08 19:27:34 <ppc123> using index "pg_toast_2619_index" despite IgnoreSystemINdexes
-
- Feb 08 19:29:16 <ppc123> Not sure where I saw missing Base
- Feb 08 19:29:28 <RhodiumToad> fortunately that has an easy fix: add -O to the connect flags and do a truncate table pg_statistic;
- Feb 08 19:29:48 <RhodiumToad> (you don't need pg_statistic's contents)
- Feb 08 19:31:57 <ppc123> done
- Feb 08 19:39:18 <ppc123> this server is slow...
- Feb 08 19:40:12 <RhodiumToad> -P doesn't help with that, but probably best to stick with it (unless you already did a reindex system in this db)
- Feb 08 19:40:40 <ppc123> I'll let it finish it's thing
- Feb 08 19:40:56 <ppc123> it's still in P mode.
-
- Feb 08 19:42:36 <RhodiumToad> -P shouldn't make a big difference to access to actual data, but it does affect system catalog lookups
-
- Feb 08 19:52:52 <ppc123> hrm... missing chunk number 0 for toast value 71436530 in pg_toast_2619
- Feb 08 19:53:18 <RhodiumToad> is the same db as before?
- Feb 08 19:53:23 <ppc123> yes
- Feb 08 19:53:24 <peerce> oh yeah, add toast tables to the things that will be out of sync if there were active updates/inserts during the interval the backup took
- Feb 08 19:53:36 <RhodiumToad> the truncate table pg_statistic; has to be run in the same db that gets the error
-
- Feb 08 19:54:26 <RhodiumToad> after doing that truncate, it should be impossible to get errors on pg_toast_2619
- Feb 08 19:55:01 <RhodiumToad> peerce: fortunately, pg_toast_2619 is the toast table for pg_statistic, which doesn't contain anything important
-
- Feb 08 19:56:04 <ppc123> how do you know that?
- Feb 08 19:56:10 <ppc123> from the dir listing?
- Feb 08 19:56:24 <RhodiumToad> 2619 is the oid of pg_statistic.
- Feb 08 19:56:32 <ppc123> ok
- Feb 08 19:56:34 <RhodiumToad> select 'pg_statistic'::regclass::oid;
- Feb 08 19:57:01 <RhodiumToad> oids < 10000 have fixed meanings for a given major version of pg
- Feb 08 19:57:26 <RhodiumToad> oids 10000 to 16383 are allocated during initdb, so they're system objects but not necessarily fixed
- Feb 08 19:58:12 <ppc123> hrm... I am gettin a hcaracter with byte sequence 0xef 0xbc in encoding UTF8 has no equivaleent in WIN1252
- Feb 08 19:58:28 <ppc123> wonder if I need to select SQLASCII
- Feb 08 19:58:33 <RhodiumToad> set your client encoding to UTF8
- Feb 08 19:58:41 <RhodiumToad> oh
- Feb 08 19:58:43 <RhodiumToad> hm
- Feb 08 19:59:02 <RhodiumToad> you may have the encoding wrong (remember what I said about that "6" in pg_database?)
- Feb 08 19:59:08 <ppc123> yep.
- Feb 08 19:59:21 <RhodiumToad> if you're actually using win1252 and not utf8, that 6 needs to be something else
- Feb 08 19:59:40 <ppc123> let me check this other cluster.
- Feb 08 19:59:41 <RhodiumToad> you copied this pg_database from a working system? what are the values of "datencoding" in pg_database?
-
- Feb 08 20:01:58 <ppc123> neat... wonder if it's windows terminal client...
-
- Feb 08 20:02:50 <RhodiumToad> psql will set a client_encoding corresponding to its guess at the display it's running in
- Feb 08 20:02:56 <RhodiumToad> which on windows may be completely wrong
- Feb 08 20:03:07 <RhodiumToad> setting the client encoding to sql_ascii may help
- Feb 08 20:04:32 <ppc123> yeah - PGadmin selects the data fine
- Feb 08 20:04:45 <ppc123> datencoding is Windows...1525
- Feb 08 20:05:24 <ppc123> English_United States.1252 - likely the EDB PG Installer default
-
- Feb 08 20:05:47 <ppc123> I think that is fine - just a problem from CLI
- Feb 08 20:05:51 <RhodiumToad> no
- Feb 08 20:06:20 <RhodiumToad> if you used 6 in pg_database's encoding column (in that CSV data), that specifies UTF8, not win1252
- Feb 08 20:06:20 <ppc123> OK, I will try sql_ascii
- Feb 08 20:06:28 <ppc123> oh
- Feb 08 20:06:43 <RhodiumToad> it's important that the value of datencoding actually match the stored data (obviously!)
- Feb 08 20:06:53 <RhodiumToad> so you may need to modify those
- Feb 08 20:07:43 <RhodiumToad> (you can do an update on pg_database to change them)
- Feb 08 20:07:57 <RhodiumToad> but after doing that, disconnect and reconnect, and reindex
- Feb 08 20:08:15 <ppc123> So, when psql connects - says... Console code page (437) differs from Windows (1252) -
- Feb 08 20:09:12 <ppc123> 'terminal' code page does say 437 (OEM - United States) on this windows server.
- Feb 08 20:09:25 <ppc123> sorry - Command Prompt Properties
-
- Feb 08 20:10:22 <ppc123> And - for this application the encoding should be UTF-8 - if they at some point installed PG without RTFM, it could have been SQL-ASCII back in pg8.x days
-
- Feb 08 20:20:16 <RhodiumToad> ppc123: did you check the value of datencoding in pg_database on a working system
-
- Feb 08 20:21:37 <ppc123> I did - it is what it *should* be, but that doesn't mean the data is 'correct' - It's not all rows, it's like 1 row that has this problem
-
- Feb 08 20:22:06 <ppc123> and that is probably the client applications' fault, or some old issue
- Feb 08 20:22:22 <ppc123> I need to see if a backup/restore works.
- Feb 08 20:26:57 <RhodiumToad> ppc123: yes, you should be aiming to do a pg_dump and then restore that into a fresh initdb elsewhere
- Feb 08 20:28:53 <ppc123> yep
-
- Feb 08 21:58:57 <ppc123> RhodiumToad - thanks for all your guidance on this - backup worked, waiting for restore to finish.
- Feb 08 21:59:14 <RhodiumToad> just remember that the data may be inconsistent
- Feb 08 21:59:42 <ppc123> we at least have readable tables that we can export from
- Feb 08 22:00:46 <ppc123> it will be put into a freshly init'd structure
- Feb 08 22:01:02 <ppc123> and tested