Snippet content copied to clipboard.
Are you sure to delete this snippet? No, don't delete
  1. Fixing a broken postgresql
  2. Feb 02 17:53:34 <ppc123> hello... been a while since I've IRC'd
  3. Feb 02 17:54:45 <ppc123> I have a PGDATA that was 'saved' from a crashing/hacked server. created a new cluster, moved the PGDATA and replaced with the one that was 'saved'
  4. Feb 02 17:55:17 <ppc123> getting... FATAL "global/9191360"... file not found when trying to connect
  5. Feb 02 17:55:34 <RhodiumToad> ppc123: are you sure you got the _whole_ pgdata?
  6. Feb 02 17:55:59 <RhodiumToad> ppc123: and that you removed all existing files before putting the new ones in place?
  7. Feb 02 17:56:13 <ppc123> it is indeed not in the global folder... there is a 9191360_fsm and a 9191360_vm, but no 9191360
  8. Feb 02 17:56:38 <ppc123> no, I am not sure I got the whole pgdata cause it wasn't me doin the gettin' :D
  9. Feb 02 17:57:09 <ppc123> this is pg9.4
  10. Feb 02 17:57:27 <ario_> RhodiumToad: :(
  11. Feb 02 17:57:38 <RhodiumToad> ppc123: do you have a listing or copy of the saved files?
  12. Feb 02 17:57:50 <ppc123> I do have access to that
  13. Feb 02 17:58:13 <RhodiumToad> can you do a directory listing of what you have and put it on a paste site?
  14. Feb 02 17:59:00 <ppc123> RhodiumToad - like a 'tree' listing?
  15. Feb 02 17:59:11 <ppc123> or a specific dir list?
  16. Feb 02 17:59:36 <RhodiumToad> ppc123: tree listing would be good.
  17. Feb 02 18:00:21 <ppc123> Ok - It is the whole 9.4/, including data - that all does look 'complete'
  18. Feb 02 18:00:51 <ppc123> One other thing I looked at was the pg_resetxlog -n output
  19. Feb 02 18:02:02 <RhodiumToad> ... do not touch the pg_destroy_my_data command
  20. Feb 02 18:02:17 <ppc123> https://pastebin.com/fzYVWh8h
  21. Feb 02 18:02:41 <ppc123> and I will not touch that again :D
  22. Feb 02 18:02:52 <RhodiumToad> pg_controldata is the command to display that info, btw
  23. Feb 02 18:03:40 <ppc123> ok, I will have to reconnect with this tomorrow.
  24. Feb 02 18:03:47 <ppc123> shoulda come here sooner :D
  25. Feb 03 17:14:12 <ppc123_> am I still here?
  26. Feb 03 17:14:33 <ppc123_> RhodiumToad - Here is that broken PGDATA Dir listing
  27. Feb 03 17:14:37 <ppc123_> https://controlc.com/eda93dcf
  28. Feb 03 17:15:45 <ppc123_> <ppc123> getting... FATAL "global/9191360"... file not found when trying to connect
  29. Feb 03 17:22:38 <Myon> that's weirdly large OID numbers in global/
  30. Feb 03 17:24:16 <Myon> ppc123_: you could try `postgres -P` and hope the missing file is an index
  31. Feb 08 10:07:25 <ppc123> RhodiumToad - Here is that broken PGDATA Dir listing https://controlc.com/eda93dcf getting... FATAL "global/9191360"... file not found when trying to connect
  32. Feb 08 10:07:42 <ppc123> It was suggested to try postgres -P and hope it's just an index
  33. Feb 08 16:16:14 <ppc123> PG Startup errors - could not locate a valid checkpoint record
  34. Feb 08 16:16:18 <ppc123> https://controlc.com/71c18170
  35. Feb 08 16:17:12 <ppc123> Related to this: Here is that broken PGDATA Dir listing https://controlc.com/eda93dcf getting... FATAL "global/9191360"... file not found when trying to connect
  36. Feb 08 16:17:21 <peerce> what happened to that server prior to this ?
  37. Feb 08 16:17:50 <ppc123> It was died... User backed up PGDATA with copy, on windows to an external drive
  38. Feb 08 16:18:28 <peerce> was that copy made with the database server stopped ?
  39. Feb 08 16:18:37 <ppc123> not sure if disk corruption or pg was still running
  40. Feb 08 16:18:46 <peerce> if not, then its not coherent, unless pg_start_backup() and pg_stop_backup() was called around the backup
  41. Feb 08 16:19:05 <ppc123> yeah, that definitely did not happen
  42. Feb 08 16:20:47 <ppc123> and... their backups were encrypted by bitcoin hackers
  43. Feb 08 16:25:38 <peerce> oh gawd.
  44. Feb 08 16:26:13 <RhodiumToad> yeesh
  45. Feb 08 16:28:07 <RhodiumToad> ppc123: so the thing that strikes me about that directory listing is the pg_clog
  46. Feb 08 16:29:12 <RhodiumToad> having just one file there is odd, it suggests that this db has only ever had about 550k write transactions?
  47. Feb 08 16:29:46 <ppc123> i dunno. it is rather large for the application
  48. Feb 08 16:29:46 <RhodiumToad> is that at all plausible in your scenario?
  49. Feb 08 16:29:50 <ppc123> so that doen't make sense
  50. Feb 08 16:29:55 <RhodiumToad> what is large?
  51. Feb 08 16:30:09 <ppc123> the size of the dir - it's about 70GB all in
  52. Feb 08 16:30:23 <ppc123> which should be more than 550k writes
  53. Feb 08 16:30:42 <RhodiumToad> about 69GB of that is xlog
  54. Feb 08 16:31:09 <ppc123> puzzled... they're not using replication.
  55. Feb 08 16:31:15 <RhodiumToad> for whatever reason you seem to have xlog going back years
  56. Feb 08 16:31:22 <peerce> did they set an archive_command and never test it ?
  57. Feb 08 16:31:32 <RhodiumToad> do you know when the copy was taken?
  58. Feb 08 16:31:34 <peerce> a broken archive_command will cause xlog to grow forever
  59. Feb 08 16:31:37 <ppc123> 10/1/2020
  60. Feb 08 16:31:43 <RhodiumToad> large value of wal_keep_segments is another possibility
  61. Feb 08 16:31:56 <ppc123> I can look at their postgresql.conf
  62. Feb 08 16:32:29 <RhodiumToad> archive_status is empty, so archive_command wouldn't be the issue
  63. Feb 08 16:33:08 <RhodiumToad> can you run pg_controldata on the copy?
  64. Feb 08 16:33:56 <ilmari_> ppc123: is that the first of october or the tenth of january?
  65. Feb 08 16:33:59 <ppc123> yes 1 min - with any flags?
  66. Feb 08 16:34:03 <ppc123> oct 1st
  67. Feb 08 16:34:35 <ppc123> RhodiumToad - pg_controldata with any flags?
  68. Feb 08 16:34:56 <RhodiumToad> just the path to the data dir
  69. Feb 08 16:37:19 <ppc123> https://controlc.com/00e1845c
  70. Feb 08 16:37:55 <RhodiumToad> ugh
  71. Feb 08 16:38:08 <RhodiumToad> so, this copy was almost certainly taken while the db was still running
  72. Feb 08 16:38:41 <RhodiumToad> notice the control file says "in production"
  73. Feb 08 16:38:52 <ppc123> yep. that's handy.
  74. Feb 08 16:39:19 <ppc123> :/ since I learned so much today, where do I send .5 BTC?
  75. Feb 08 16:39:21 <RhodiumToad> and it gives a checkpoint location, but startup shows that location was in a recycled-and-not-yet-used segment
  76. Feb 08 16:40:10 <RhodiumToad> ..._how_ much?
  77. Feb 08 16:40:52 <ppc123> I think industry standard is .5 BTC.
  78. Feb 08 16:41:09 <peerce> isn't that like US$500 now or something?
  79. Feb 08 16:41:33 <ilmari> peerce: more like 22k
  80. Feb 08 16:41:44 <ppc123> RhodiumToad - you're saying it's - fubar'd ?
  81. Feb 08 16:42:46 <RhodiumToad> recovery prospects aren't good. maybe one of the consulting firms would take it on, but I'd hesitate a bit to try it over irc, even though I've helped people that way before
  82. Feb 08 16:43:23 <RhodiumToad> you've tried a resetxlog? (on a copy of the copy, of course - never do that on your only backup)
  83. Feb 08 16:43:44 <peerce> ppc123; fundamental problem is, taking a backup of lots of files took some finite time, and those files were being randomly modified during this time, so the backup isn't a coherent snapshot.
  84. Feb 08 16:44:46 <ppc123> I did try resetxlog
  85. Feb 08 16:44:59 <ppc123> the - pg_destroy_my_data_please command :D
  86. Feb 08 16:45:21 <peerce> the correct way is to use pg_basebackup (built in), or pgBackRest (3rd party, free), or to bracket your file system backup with pg_start/stop_backup() API calls.
  87. Feb 08 16:45:24 <ppc123> yeah - I would never recommend they do that - and didn't...
  88. Feb 08 16:45:48 <ppc123> this was done on assumptions that... it could be done.
  89. Feb 08 16:46:01 <RhodiumToad> ppc123: did it start after a resetxlog, and what did you try?
  90. Feb 08 16:46:40 <ppc123> let me do that 1 more time - I believe pg started but I got the global missing /somefile
  91. Feb 08 16:46:47 <ppc123> was last week...
  92. Feb 08 16:46:59 <RhodiumToad> yeah, I remember asking for the info then
  93. Feb 08 16:47:14 <RhodiumToad> but our paths failed to cross since
  94. Feb 08 16:47:16 <ppc123> ya - that is the other pastebin
  95. Feb 08 16:47:31 <ppc123> oh
  96. Feb 08 16:47:56 <ppc123> so - anything special - like XLOGFILE or OID/OFFSET to pass to resetxlog?
  97. Feb 08 16:49:21 <RhodiumToad> iirc the manpage has some suggestions for that. specify an xlogfile larger than the largest you have, and an xid of 1048576 should do
  98. Feb 08 16:49:32 <RhodiumToad> oid shouldn't matter much
  99. Feb 08 16:50:50 <RhodiumToad> https://www.postgresql.org/docs/9.4/app-pgresetxlog.html -- the "safe values can be determined" part
  100. Feb 08 16:50:59 <RhodiumToad> (this is for fairly small values of "safe")
  101. Feb 08 16:52:50 <ppc123> my largest pg_clog file is... 0000 :D
  102. Feb 08 16:53:09 <RhodiumToad> that's why I said 1048576
  103. Feb 08 16:53:32 <ppc123> ohh
  104. Feb 08 16:53:54 <ppc123> missed that... I was gonna do the math...
  105. Feb 08 16:55:34 <ppc123> did: pg_resetxlog.exe -f -x 1048576 ../data
  106. Feb 08 16:55:42 <ppc123> now try starting?
  107. Feb 08 16:56:48 <ppc123> pg_ctl or other?
  108. Feb 08 16:59:58 <ppc123> bah. global/9191360 no such file or dir
  109. Feb 08 17:01:15 <ppc123> i.e. psql -U postgres -h 127.0.0.1 -p 5433 postgres -> FATAL could not open file "global/9191360": no such thing
  110. Feb 08 17:04:53 <ppc123> welp, it was a good try.
  111. Feb 08 17:05:43 <RhodiumToad> ppc123: wait, there's more to try
  112. Feb 08 17:05:58 <ppc123> but wait, there's more?!
  113. Feb 08 17:06:01 <RhodiumToad> the next question is whether you can get in with system indexes disabled
  114. Feb 08 17:06:06 <ppc123> 1 min
  115. Feb 08 17:06:31 <RhodiumToad> easiest way to try that is to stop the server, and try going in with postgres --single -P
  116. Feb 08 17:07:45 <RhodiumToad> if you can get in with that, reindex system on any database may help
  117. Feb 08 17:08:21 <RhodiumToad> it's also possible to try that with the server running, by setting the environment var PGOPTIONS="-P" before running psql
  118. Feb 08 17:15:18 <ppc123> hrm... nope. still FATAL could not open global/9191360
  119. Feb 08 17:16:17 <RhodiumToad> which method did you use?
  120. Feb 08 17:16:22 <ppc123> tried... with PGOPTIONS="-P",
  121. Feb 08 17:16:34 <ppc123> then psql -U postgres -h 127.0.0.1 -p 5433 postgres
  122. Feb 08 17:16:54 <RhodiumToad> ppc123: can you try the --single method too
  123. Feb 08 17:18:07 <RhodiumToad> ppc123: this kind of error suggests that someone was doing either a vacuum full or a reindex of the global tables at the time the data was being copied
  124. Feb 08 17:18:31 <RhodiumToad> ppc123: does that seem likely?
  125. Feb 08 17:19:10 <ppc123> Anything is likely with this particular user.
  126. Feb 08 17:19:29 <ppc123> So, --single hits same global/blah not exists
  127. Feb 08 17:20:28 <ppc123> I have one more paste - to show that --single output
  128. Feb 08 17:21:08 <ppc123> https://controlc.com/bcce480d
  129. Feb 08 17:23:11 <ppc123> is global/ relating to users? or databases?
  130. Feb 08 17:23:38 <RhodiumToad> global/ contains the tables for pg_database, pg_shdepend, pg_authid, pg_auth_members, pg_tablespace
  131. Feb 08 17:23:55 <RhodiumToad> (might be a couple more I forget but those are the main ones)
  132. Feb 08 17:24:10 <RhodiumToad> users/groups/roles, databases, tablespaces
  133. Feb 08 17:24:11 <ppc123> hrm... don't suppose one from a working cluster could be grafted in
  134. Feb 08 17:24:30 <RhodiumToad> ... maybe.
  135. Feb 08 17:24:37 <ppc123> we do have one of those.
  136. Feb 08 17:24:42 <RhodiumToad> question is, which is which
  137. Feb 08 17:24:58 <RhodiumToad> the high filenode shows that the file has been vacuum full'd
  138. Feb 08 17:25:10 <RhodiumToad> (or clustered - something that rewrites the file, anyway)
  139. Feb 08 17:25:29 <RhodiumToad> can you try that --single again but include -P this time
  140. Feb 08 17:25:46 <RhodiumToad> postgres --single -d5 -P -D ../data
  141. Feb 08 17:26:12 <ppc123> same - doesn't start due to global
  142. Feb 08 17:26:19 <ppc123> without single it does start
  143. Feb 08 17:26:38 <RhodiumToad> so the file must be one of the actual tables and not an index
  144. Feb 08 17:26:47 <ppc123> but can't connect with anything - it does recognize the user exists or not
  145. Feb 08 17:26:52 <RhodiumToad> global contains the indexes for all the tables I mentioned above
  146. Feb 08 17:27:26 <RhodiumToad> does or doesn't?
  147. Feb 08 17:28:10 <ppc123> does recognize users, but does not connect to db - template0, template1,postgres, etc
  148. Feb 08 17:28:31 <RhodiumToad> wait a sec. let me work this out
  149. Feb 08 17:28:39 <ppc123> it knows which roles exist
  150. Feb 08 17:29:23 <RhodiumToad> ok. there are supposed to be 7 tables in global/ and you have 6, and one obviously missing
  151. Feb 08 17:29:35 <RhodiumToad> so we can work out which is which by elimination I guess
  152. Feb 08 17:29:59 <RhodiumToad> can you do a hexdump of the global/pg_filenode.map file?
  153. Feb 08 17:31:00 <ppc123> sure, 1 min
  154. Feb 08 17:31:08 <Myon> I guess you could just "touch" the file and see if that helps
  155. Feb 08 17:31:17 <Myon> if not you can still delete it again...
  156. Feb 08 17:31:41 <RhodiumToad> that could work if it's pg_auth_members that's missing. for the others, not so much
  157. Feb 08 17:31:55 <RhodiumToad> maybe for pg_shdepend too
  158. Feb 08 17:31:56 <Myon> maybe it will at least say which one is missing
  159. Feb 08 17:32:28 <RhodiumToad> so yeah, try creating that as an empty (0 length) file and see what changes
  160. Feb 08 17:33:03 <RhodiumToad> (this is windows, I don't think there's a "touch", but you can copy nul or something, it's been a while)
  161. Feb 08 17:34:26 <ppc123> so... pg_filenode.map - I brought it on my linux - like... open it with a hex editor?
  162. Feb 08 17:35:11 <ppc123> or try creating an empty global/919... file?
  163. Feb 08 17:35:28 <Myon> (there should be plenty of empty other files to copy)
  164. Feb 08 17:35:39 <ppc123> there are
  165. Feb 08 17:35:55 <RhodiumToad> ppc123: on linux, hexdump -C filename should work
  166. Feb 08 17:37:33 <ppc123> filenode_map hex - https://controlc.com/79a4a43d
  167. Feb 08 17:38:11 <ppc123> and create an empty global/9191360 file ?
  168. Feb 08 17:40:31 <RhodiumToad> yes
  169. Feb 08 17:40:59 <RhodiumToad> uh.
  170. Feb 08 17:41:14 <RhodiumToad> so the file it's missing is pg_database, I'm not sure whether to believe that
  171. Feb 08 17:41:37 <ppc123> oh, well... progress - not getting the missing global
  172. Feb 08 17:41:50 <ppc123> but... databse "postgres" does not exist
  173. Feb 08 17:42:05 <ppc123> or template0, template1
  174. Feb 08 17:44:16 <RhodiumToad> that's consistent with it being pg_database that's missing :-)
  175. Feb 08 17:44:29 <RhodiumToad> that could, in theory, be reconstructed
  176. Feb 08 17:44:40 <ppc123> re-initted?
  177. Feb 08 17:44:56 <peerce> the question is, what OTHER databases were on this cluster that you need ?
  178. Feb 08 17:45:12 <ppc123> yeah - those would be the more important ones :D
  179. Feb 08 17:45:20 <RhodiumToad> the directory listing shows there were 5 dbs other than the 3 standard ones?
  180. Feb 08 17:46:02 <ppc123> sounds about correct.
  181. Feb 08 17:46:07 <RhodiumToad> fortunately their oids can be recovered from the directory names and most of the other stuff in pg_database probably doesn't matter
  182. Feb 08 17:47:06 <ppc123> yes - that is correct - about 4 or 5 - testing, dev, prod, etc
  183. Feb 08 17:47:33 <RhodiumToad> the names might or might not be recoverable from one of the indexes.
  184. Feb 08 17:47:46 <ppc123> names aren't important really
  185. Feb 08 17:47:54 <ppc123> we can figure out which is what from contents
  186. Feb 08 17:48:40 <ppc123> db1, db2, db3, db4 is fine :D
  187. Feb 08 17:49:28 <RhodiumToad> there'll be some work involved. how much time do you have available?
  188. Feb 08 17:49:59 <ppc123> Days?
  189. Feb 08 17:50:47 <ppc123> when I checked on this db today - they said "Oh, we had a database guy come look at this..." so, I had to start over today :D
  190. Feb 08 17:51:29 <RhodiumToad> ppc123: ok. I need to take a break for a few minutes, I'll be back
  191. Feb 08 17:51:34 <ppc123> NP
  192. Feb 08 17:51:55 <ppc123> do I need any files from this thing - I'm on a zoom
  193. Feb 08 17:58:25 <RhodiumToad> ppc123: the plan of attack I'm thinking of is: create a new cluster with initdb (in a different dir), find the pg_database file using select pg_relation_filepath('pg_database'::regclass);
  194. Feb 08 17:58:56 <RhodiumToad> copying that in might get you access to template1 and/or postgres when going in with --single -P
  195. Feb 08 17:59:02 <ppc123> ok
  196. Feb 08 17:59:12 <RhodiumToad> then reindex system postgres;
  197. Feb 08 18:00:10 <RhodiumToad> then create a CSV file containing entries for your actual dbs, and insert it using COPY pg_database FROM 'file' WITH (format CSV, OIDS);
  198. Feb 08 18:00:31 <RhodiumToad> the data that goes into that file will take a bit of working out.
  199. Feb 08 18:00:50 <RhodiumToad> first question is do you know what encoding and locale the old system was using?
  200. Feb 08 18:02:08 <RhodiumToad> the lines of the CSV file should be something like:
  201. Feb 08 18:03:28 <RhodiumToad> nnn,db_nnn,10,6,"C","C",f,t,-1,12487,648,1,1663,
  202. Feb 08 18:03:56 <RhodiumToad> where nnn is the directory name from base/ for each directory >= 16384
  203. Feb 08 18:04:21 <RhodiumToad> the 6 needs to be changed if the encoding is not UTF8, and the "C" needs to be changed to the locale name if it's not C
  204. Feb 08 18:04:32 <RhodiumToad> (the trailing , is required)
  205. Feb 08 18:06:42 <ppc123> encoding should be UTF8
  206. Feb 08 18:07:15 <ppc123> Ok, I got a global from a good cluster
  207. Feb 08 18:07:20 <ppc123> \q
  208. Feb 08 18:09:15 <ppc123> good global is 12131
  209. Feb 08 18:09:30 <ppc123> do I drop that in the broken dir as 9191360 ?
  210. Feb 08 18:09:53 <RhodiumToad> yes
  211. Feb 08 18:12:34 <ppc123> ok, I am connected to the broken cluster, and see the db names now
  212. Feb 08 18:12:49 <RhodiumToad> just the postgres, template0, template1 dbs?
  213. Feb 08 18:13:18 <ppc123> well, I took the naming from an existing running cluster - so there are the other databases.
  214. Feb 08 18:13:31 <RhodiumToad> er
  215. Feb 08 18:13:52 <RhodiumToad> can you connect to any of them? because it's not likely that the oids are the same between clusters
  216. Feb 08 18:14:35 <ppc123> they are not the same OID's - testdb owner unknown (OID=292121233_
  217. Feb 08 18:15:46 <ppc123> I did not do the CSV part
  218. Feb 08 18:16:21 <RhodiumToad> I mean, if you try and connect to any of those dbs it would likely complain that the directory is missing, no?
  219. Feb 08 18:16:51 <RhodiumToad> filling in entries for the db oids you actually have is what the csv part is for
  220. Feb 08 18:20:01 <ppc123> ok, let me follow your instructions
  221. Feb 08 18:20:03 <RhodiumToad> I need to go eat now, back in maybe half an hour
  222. Feb 08 18:20:15 <ppc123> thanks for getting me this far
  223. Feb 08 18:50:21 <RhodiumToad> back
  224. Feb 08 18:58:30 <ppc123> so, what I've come up with
  225. Feb 08 18:59:41 <ppc123> https://controlc.com/cc01e189
  226. Feb 08 19:00:31 <ppc123> getting a ERROR: extra data after last expected column...
  227. Feb 08 19:00:47 <RhodiumToad> I think you missed one
  228. Feb 08 19:00:57 <RhodiumToad> what's that ' ' at the end?
  229. Feb 08 19:01:14 <ppc123> datacl
  230. Feb 08 19:01:19 <RhodiumToad> after 1663 there should be one comma and a newline
  231. Feb 08 19:01:25 <RhodiumToad> datacl should be left null for now
  232. Feb 08 19:01:39 <RhodiumToad> also that ; at the end is wrong
  233. Feb 08 19:01:48 <RhodiumToad> and there should be no ' quotes anywhere
  234. Feb 08 19:02:08 <RhodiumToad> the db_nnn does not require quotes
  235. Feb 08 19:02:13 <ppc123> oh
  236. Feb 08 19:03:30 <ppc123> that copied.
  237. Feb 08 19:04:00 <RhodiumToad> now the question is whether you can connect to any of those dbs
  238. Feb 08 19:05:06 <ppc123> I'm gonna try this 12GB one...
  239. Feb 08 19:06:44 <ppc123> hrm... might have my encoding wrong - BUT... I see tables
  240. Feb 08 19:07:22 <ppc123> ph - well meaybe not
  241. Feb 08 19:07:27 <RhodiumToad> bear in mind that there may be inconsistencies in the data
  242. Feb 08 19:07:32 <RhodiumToad> maybe not what?
  243. Feb 08 19:07:43 <ppc123> nope.. I'm rushing ahead.
  244. Feb 08 19:07:54 <ppc123> should I reindex?
  245. Feb 08 19:08:18 <peerce> anything that was being upodated during the backup is going to have a mix of old and new tuples and out of sync indexes
  246. Feb 08 19:08:36 <peerce> FK relations might be out of sync, too
  247. Feb 08 19:12:27 <RhodiumToad> ppc123: what are you seeing?
  248. Feb 08 19:13:43 <RhodiumToad> generally speaking false positives are only likely to happen on the table level
  249. Feb 08 19:14:06 <RhodiumToad> some kinds of queries have to SI lock the whole table (e.g. because it was seq-scanned)
  250. Feb 08 19:14:43 <RhodiumToad> generally the escalation path is individual index rows -> index blocks (tens or hundreds of adjacent rows) -> whole table,
  251. Feb 08 19:14:46 <ppc123> running a reindex
  252. Feb 08 19:14:52 <RhodiumToad> and individual table blocks -> whole table
  253. Feb 08 19:15:06 <RhodiumToad> ppc123: what did you _see_
  254. Feb 08 19:15:36 <ppc123> ERROR about missing base/
  255. Feb 08 19:15:45 <ppc123> when select * from table
  256. Feb 08 19:15:48 <RhodiumToad> missing base/WHAT?
  257. Feb 08 19:16:09 <RhodiumToad> and are you still using -P when connecting? (you should)
  258. Feb 08 19:26:44 <ppc123> ok - so I am seeeing - ERROR: missing chunk number - for toast value 70677992 in pg_toast_2619
  259. Feb 08 19:27:34 <ppc123> using index "pg_toast_2619_index" despite IgnoreSystemINdexes
  260. Feb 08 19:29:16 <ppc123> Not sure where I saw missing Base
  261. Feb 08 19:29:28 <RhodiumToad> fortunately that has an easy fix: add -O to the connect flags and do a truncate table pg_statistic;
  262. Feb 08 19:29:48 <RhodiumToad> (you don't need pg_statistic's contents)
  263. Feb 08 19:31:57 <ppc123> done
  264. Feb 08 19:39:18 <ppc123> this server is slow...
  265. Feb 08 19:40:12 <RhodiumToad> -P doesn't help with that, but probably best to stick with it (unless you already did a reindex system in this db)
  266. Feb 08 19:40:40 <ppc123> I'll let it finish it's thing
  267. Feb 08 19:40:56 <ppc123> it's still in P mode.
  268. Feb 08 19:42:36 <RhodiumToad> -P shouldn't make a big difference to access to actual data, but it does affect system catalog lookups
  269. Feb 08 19:52:52 <ppc123> hrm... missing chunk number 0 for toast value 71436530 in pg_toast_2619
  270. Feb 08 19:53:18 <RhodiumToad> is the same db as before?
  271. Feb 08 19:53:23 <ppc123> yes
  272. Feb 08 19:53:24 <peerce> oh yeah, add toast tables to the things that will be out of sync if there were active updates/inserts during the interval the backup took
  273. Feb 08 19:53:36 <RhodiumToad> the truncate table pg_statistic; has to be run in the same db that gets the error
  274. Feb 08 19:54:26 <RhodiumToad> after doing that truncate, it should be impossible to get errors on pg_toast_2619
  275. Feb 08 19:55:01 <RhodiumToad> peerce: fortunately, pg_toast_2619 is the toast table for pg_statistic, which doesn't contain anything important
  276. Feb 08 19:56:04 <ppc123> how do you know that?
  277. Feb 08 19:56:10 <ppc123> from the dir listing?
  278. Feb 08 19:56:24 <RhodiumToad> 2619 is the oid of pg_statistic.
  279. Feb 08 19:56:32 <ppc123> ok
  280. Feb 08 19:56:34 <RhodiumToad> select 'pg_statistic'::regclass::oid;
  281. Feb 08 19:57:01 <RhodiumToad> oids < 10000 have fixed meanings for a given major version of pg
  282. Feb 08 19:57:26 <RhodiumToad> oids 10000 to 16383 are allocated during initdb, so they're system objects but not necessarily fixed
  283. Feb 08 19:58:12 <ppc123> hrm... I am gettin a hcaracter with byte sequence 0xef 0xbc in encoding UTF8 has no equivaleent in WIN1252
  284. Feb 08 19:58:28 <ppc123> wonder if I need to select SQLASCII
  285. Feb 08 19:58:33 <RhodiumToad> set your client encoding to UTF8
  286. Feb 08 19:58:41 <RhodiumToad> oh
  287. Feb 08 19:58:43 <RhodiumToad> hm
  288. Feb 08 19:59:02 <RhodiumToad> you may have the encoding wrong (remember what I said about that "6" in pg_database?)
  289. Feb 08 19:59:08 <ppc123> yep.
  290. Feb 08 19:59:21 <RhodiumToad> if you're actually using win1252 and not utf8, that 6 needs to be something else
  291. Feb 08 19:59:40 <ppc123> let me check this other cluster.
  292. Feb 08 19:59:41 <RhodiumToad> you copied this pg_database from a working system? what are the values of "datencoding" in pg_database?
  293. Feb 08 20:01:58 <ppc123> neat... wonder if it's windows terminal client...
  294. Feb 08 20:02:50 <RhodiumToad> psql will set a client_encoding corresponding to its guess at the display it's running in
  295. Feb 08 20:02:56 <RhodiumToad> which on windows may be completely wrong
  296. Feb 08 20:03:07 <RhodiumToad> setting the client encoding to sql_ascii may help
  297. Feb 08 20:04:32 <ppc123> yeah - PGadmin selects the data fine
  298. Feb 08 20:04:45 <ppc123> datencoding is Windows...1525
  299. Feb 08 20:05:24 <ppc123> English_United States.1252 - likely the EDB PG Installer default
  300. Feb 08 20:05:47 <ppc123> I think that is fine - just a problem from CLI
  301. Feb 08 20:05:51 <RhodiumToad> no
  302. Feb 08 20:06:20 <RhodiumToad> if you used 6 in pg_database's encoding column (in that CSV data), that specifies UTF8, not win1252
  303. Feb 08 20:06:20 <ppc123> OK, I will try sql_ascii
  304. Feb 08 20:06:28 <ppc123> oh
  305. Feb 08 20:06:43 <RhodiumToad> it's important that the value of datencoding actually match the stored data (obviously!)
  306. Feb 08 20:06:53 <RhodiumToad> so you may need to modify those
  307. Feb 08 20:07:43 <RhodiumToad> (you can do an update on pg_database to change them)
  308. Feb 08 20:07:57 <RhodiumToad> but after doing that, disconnect and reconnect, and reindex
  309. Feb 08 20:08:15 <ppc123> So, when psql connects - says... Console code page (437) differs from Windows (1252) -
  310. Feb 08 20:09:12 <ppc123> 'terminal' code page does say 437 (OEM - United States) on this windows server.
  311. Feb 08 20:09:25 <ppc123> sorry - Command Prompt Properties
  312. Feb 08 20:10:22 <ppc123> And - for this application the encoding should be UTF-8 - if they at some point installed PG without RTFM, it could have been SQL-ASCII back in pg8.x days
  313. Feb 08 20:20:16 <RhodiumToad> ppc123: did you check the value of datencoding in pg_database on a working system
  314. Feb 08 20:21:37 <ppc123> I did - it is what it *should* be, but that doesn't mean the data is 'correct' - It's not all rows, it's like 1 row that has this problem
  315. Feb 08 20:22:06 <ppc123> and that is probably the client applications' fault, or some old issue
  316. Feb 08 20:22:22 <ppc123> I need to see if a backup/restore works.
  317. Feb 08 20:26:57 <RhodiumToad> ppc123: yes, you should be aiming to do a pg_dump and then restore that into a fresh initdb elsewhere
  318. Feb 08 20:28:53 <ppc123> yep
  319. Feb 08 21:58:57 <ppc123> RhodiumToad - thanks for all your guidance on this - backup worked, waiting for restore to finish.
  320. Feb 08 21:59:14 <RhodiumToad> just remember that the data may be inconsistent
  321. Feb 08 21:59:42 <ppc123> we at least have readable tables that we can export from
  322. Feb 08 22:00:46 <ppc123> it will be put into a freshly init'd structure
  323. Feb 08 22:01:02 <ppc123> and tested

Edit this Snippet