Remix clone Hacker News

new | show | ask | jobs Github

	▲	ww520 4 hours ago
		Strings in textual index are already compressed, with common prefix compression or other schemes. They are perfectly queryable. Not sure if their compression scheme is for index or data columns. Global column dictionary has more complexity than normal. Now you are touching more pages than just the index pages and data page. The dictionary entries are sorted, so you need to worry about page expansion and contraction. They sidestep the problems by making it immutable, presumably building it up front by scanning all the data. Not sure why using FSST is better than using a standard compression algorithm to compress the dictionary entries. Storing the strings themselves as dictionary IDs is a good idea, as they can be processed quickly with SIMD.