| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| Is there a way, using windows freeware, to sort unique a huge hosts file? I've concatonated all the freeware windows hosts files I can find into a single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file but the resulting hosts file is so huge, replete with duplicates, that it's slowing down windows browsing. I would like to pare the hosts file to remove duplicates. How? I tried sorting with windows vim 7.1 freeware but I can't get the unique sort option to work inside of vim. What am I doing wrong? Here is a vim 7.1 command that works inside the huge hosts file: :%!sort (this sorts the huge windows hosts file just fine) This vim 7.1 sort unique command should work but it does not: :%!sort -u (this is supposed to sort uniquely) The syntax is: <esc>: (begin a windows vim 7.1 command) !sort -u (run the following command "sort -u" inside of vim freeware) When I run "<esc>:!sort -u" inside of vim, it pares the hosts file down to a single (empty) line. Is there another free way to sort uniquely a large windows text file? |
|
#2
| |||
| |||
| On Sat, 2 Aug 2008 11:00:26 -0700, Donita Luddington wrote: > This vim 7.1 sort unique command should work but it does not: > :%!sort -u (this is supposed to sort uniquely) This syntax means: execute external sort command with the unique option. It will most likely trigger Windows own sort command, which has no -u option. Therefore, it fails. Get a sort.exe which supports -u as part of the UnxUtils package: http://sourceforge.net/projects/unxutils An updated version is part of UnxUpdates.zip, which at the moment is not available via SourceForge. So you may get it here: http://www.weihenstephan.de/~syring/...UtilsDist.html When you're at it, you can execute it directly, giving your file as input source. (No need to bother Vim with this task.) If you start using UnxUtils on a more regular basis (they are worthwhile), you may consider to copy all *.exe files into a directory and add this to your search path. I explained the how-to only yesterday. So you may have a look at: Message-ID: <msm167avz8w.dlg@br.ederson.news.arcor.de> or look the posting up through Google Groups: http://groups.google.de/group/alt.co...621001ce95ea22 HTH. BeAr -- ================================================== ========================= = What do you mean with: "Perfection is always an illusion"? = ================================================== =============--(Oops!)=== |
|
#3
| |||
| |||
| [FollowUp-To: alt.comp.freeware] Donita Luddington wrote: > Newsgroups: alt.comp.freeware,comp.unix.questions,comp.editors > When you do crosspost you should set followup-to > From: Donita Luddington <doniludd@sbcglobal.net> > NNTP-Posting-Host: 69.110.47.74 > Message-ID: <Oi1lk.6166$np7.5749@flpi149.ffdc.sbc.com> > > Is there a way, using windows freeware, to sort unique a huge > hosts file? > > I've concatonated all the freeware windows hosts files I can find > into a single huge fifty-thousand line > C:\Windows\System\Drivers\Etc\hosts file but the resulting hosts > file is so huge, replete with duplicates, that it's slowing down > windows browsing. > > I would like to pare the hosts file to remove duplicates. How? > Start with pushing concatenated file through rpsort rpsort /q /d /n < infile > outfile Then view outfile for "semi-duplicate" entries to determine if further processing necessary. e.g. spacing, comment, or... something else differences -- OpenPGP: id=18795161E22D3905; preference=signencrypt; url=http://guysalias.fateback.com/pgpkeys.txt |
|
#4
| |||
| |||
| "Donita Luddington" <doniludd@sbcglobal.net> wrote in message news:Oi1lk.6166$np7.5749@flpi149.ffdc.sbc.com... > Is there a way, using windows freeware, to sort unique a huge hosts file? > > I've concatonated all the freeware windows hosts files I can find into a > single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file > but the resulting hosts file is so huge, replete with duplicates, that > it's > slowing down windows browsing. > > I would like to pare the hosts file to remove duplicates. How? > > I tried sorting with windows vim 7.1 freeware but I can't get the unique > sort option to work inside of vim. What am I doing wrong? > > Here is a vim 7.1 command that works inside the huge hosts file: > :%!sort (this sorts the huge windows hosts file just fine) > > This vim 7.1 sort unique command should work but it does not: > :%!sort -u (this is supposed to sort uniquely) > > The syntax is: > <esc>: (begin a windows vim 7.1 command) > !sort -u (run the following command "sort -u" inside of vim freeware) > > When I run "<esc>:!sort -u" inside of vim, it pares the hosts file down to > a single (empty) line. > > Is there another free way to sort uniquely a large windows text file? I'd use a Linux VM. Then it's a simple tr '[A-Z]' '[a-z]' < file.txt | sort | uniq > file2.txt |
|
#5
| |||
| |||
| On Sat, 2 Aug 2008 20:01:20 +0100, John Stubbings wrote: > tr '[A-Z]' '[a-z]' < file.txt | sort | uniq > file2.txt Hey John, I guess your command does this tr (translate character by character) '[A-Z]' (from any character in the set of capital A to capital Z) '[a-z]' (to the corresponding character in the set of small a to small z) < file.txt (from the input file "file.txt") | (pipe that output to the next command) sort (sort alphabetically those results) | (pipe that output to the next command) uniq (uniquify the results, line by line, > file2.txt (and save as "file2.txt) If I had Linux installed, I'd run that in a flash but I don't have any Linux and, I guess, it would take many hours to install so I'm looking for a native windows freeware solution instead. There must be a way to uniquify a file from within vi freeware on windows. |
|
#6
| |||
| |||
| On Sat, 2 Aug 2008 20:29:35 +0200, B. R. 'BeAr' Ederson wrote: >> This vim 7.1 sort unique command should work but it does not: >> :%!sort -u (this is supposed to sort uniquely) > > This syntax means: execute external sort command with the unique > option. It will most likely trigger Windows own sort command, > which has no -u option. Therefore, it fails. Hey BeAr, I did not realize this was calling a windows "sort.exe" command. I thought it was calling a "vim" sort command. I just looked to see if the windows sort command has a "unique" option. c:\> sort /? > c:\temp\sort.txt c:\> type c:\temp\sort.txt SORT [/R] [/+n] [/M kilobytes] [/L locale] [/REC recordbytes] [[drive1:][path1]filename1] [/T [drive2:][path2]] [/O [drive3:][path3]filename3] /+n Specifies the character number, n, to begin each comparison. /+3 indicates that each comparison should begin at the 3rd character in each line. Lines with fewer than n characters collate before other lines. By default comparisons start at the first character in each line. /L[OCALE] locale Overrides the system default locale with the specified one. The ""C"" locale yields the fastest collating sequence and is currently the only alternative. The sort is always case insensitive. /M[EMORY] kilobytes Specifies amount of main memory to use for the sort, in kilobytes. The memory size is always constrained to be a minimum of 160 kilobytes. If the memory size is specified the exact amount will be used for the sort, regardless of how much main memory is available. The best performance is usually achieved by not specifying a memory size. By default the sort will be done with one pass (no temporary file) if it fits in the default maximum memory size, otherwise the sort will be done in two passes (with the partially sorted data being stored in a temporary file) such that the amounts of memory used for both the sort and merge passes are equal. The default maximum memory size is 90% of available main memory if both the input and output are files, and 45% of main memory otherwise. /REC[ORD_MAXIMUM] characters Specifies the maximum number of characters in a record (default 4096, maximum 65535). /R[EVERSE] Reverses the sort order; that is, sorts Z to A, then 9 to 0. [drive1:][path1]filename1 Specifies the file to be sorted. If not specified, the standard input is sorted. Specifying the input file is faster than redirecting the same file as standard input. /T[EMPORARY] [drive2:][path2] Specifies the path of the directory to hold the sort's working storage, in case the data does not fit in main memory. The default is to use the system temporary directory. /O[UTPUT] [drive3:][path3]filename3 Specifies the file where the sorted input is to be stored. If not specified, the data is written to the standard output. Specifying the output file is faster than redirecting standard output to the same file. You're right. There is no "-unique" option to the windows sort command. There must be a unique option inside of a windows text editor somewhere! |
|
#7
| |||
| |||
| Donita Luddington: > Is there another free way to sort uniquely a large windows text file? Notepad++ with the TextFX plug-in. I am not sure about the maximun file size it can handle though. http://notepad-plus.sourceforge.net/uk/site.htm Yrrah |
|
#8
| |||
| |||
| Donita Luddington wrote: > The syntax is: > <esc>: (begin a windows vim 7.1 command) >!sort -u (run the following command "sort -u" inside of vim freeware) > If you want to stick with vi try... :%!sort |uniq (yes, there is space between external "sort" and the pipe "|") But you may still have "semi-duplicate" lines. e.g. 0.0.0.0 example.com 0.0.0.0 example.com 0.0.0.0 example.com #bad domain -- OpenPGP: id=18795161E22D3905; preference=signencrypt; url=http://guysalias.fateback.com/pgpkeys.txt |
|
#9
| |||
| |||
| On Sat, 2 Aug 2008 12:35:29 -0700, Donita Luddington wrote: > There must be a way to uniquify a file from within vi freeware on windows. I found these pointers for removing duplicate lines in vi http://rayninfo.co.uk/vimtips.html :%s/^\(.*\)\n\1$/\1/ : delete duplicate lines http://www.vim.org/tips/tip.php?tip_id=305 :%s/^\(.*\)\n\1/\1$/ delete duplicate lines But, executed in vim 7.1 on Windows, this syntax returns an error. I'll try some more after I figure out what it's trying to do. I think it's trying to do this: :%s/ ==> search the whole file for ^ ==> the beginning of a line \(.*\) ==> any character and any number of characters after that \n ==> a newline character \1 ==> what you found the first time (ie, the whole line) / ==> replace it with \1 ==> what you found that first time (again, the original line) $ ==> I'm not sure why this "end of line" character is needed / ==> I'm not sure why this "end of command" character is needed Before I debug why removal of duplicate lines in Lemmy and Vim freeware isn't working, can you tell me if I have the command to do so right? |
|
#10
| |||
| |||
| On Sat, 2 Aug 2008 12:35:29 -0700, Donita Luddington wrote: > If I had Linux installed, I'd run that in a flash but I don't have any > Linux and, I guess, it would take many hours to install so I'm looking for > a native windows freeware solution instead. > > There must be a way to uniquify a file from within vi freeware on windows. As I wrote: Get an appropriate sort command. You can even replace the Windows one, as long as you don't need exactly this implementation for some purpose. (Better set up a dedicated UnxUtils directory with entry in the search path, though.) Btw.: All the tools John suggested are part of the UnxUtils package. So it shouldn't take "hours" to follow his good advice... ;-) Notepad++ - as Yrrah suggested - should do fine in general. But it gets awfully slow when sorting huge files. That's the case with most internal editor solutions. The UnxUtils variant (either sort -u alone or (better) piped through tr, beforehand, as John suggested) should take barely a second. When the UnxUtils sort will be found in search path *before* the Windows one (or *instead*, if you just replace it), your Vim command should do fine. (If it has to be an "editor solution".) BeAr -- ================================================== ========================= = What do you mean with: "Perfection is always an illusion"? = ================================================== =============--(Oops!)=== |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.