How to Remove Duplicate Lines from Any Text File

Duplicate lines are a common data quality problem. They appear in keyword lists, email address exports, log files, CSV exports, and any text file that has been merged, copied, or updated incrementally over time. Removing duplicates is a routine cleaning step before importing data, publishing a list, or processing text programmatically.

Using an online tool

The fastest approach for a one-off task: use the Remove Duplicate Lines tool on TextUtils. Paste your text, and duplicate lines are removed instantly in your browser — ordered duplicates are removed while preserving the first occurrence and the original order. Nothing is sent to any server.

Using the command line (Unix/macOS/Linux)

The Unix toolkit has two classic tools for this:

# Remove adjacent duplicate lines (input must be sorted)
sort input.txt | uniq > output.txt

# Remove duplicates while preserving original order (GNU awk)
awk '!seen[$0]++' input.txt > output.txt

sort | uniq is fast and simple but sorts the output, losing the original order. awk '!seen[$0]++' preserves the first occurrence of each line in its original position — the seen array tracks which lines have been encountered, and ! inverts the boolean to print only lines not yet seen.

Using Python

For programmatic use or integration into a larger pipeline:

with open('input.txt') as f:
    lines = f.readlines()

seen = set()
unique = []
for line in lines:
    key = line.strip()
    if key not in seen:
        seen.add(key)
        unique.append(line)

with open('output.txt', 'w') as f:
    f.writelines(unique)

This preserves original order and handles trailing whitespace consistently. Adjust line.strip() to line.rstrip('\n') if you want to preserve leading whitespace while still detecting duplicates.

Using JavaScript

const text = "line1\nline2\nline1\nline3";
const unique = [...new Set(text.split('\n'))].join('\n');
// "line1\nline2\nline3"

Note: new Set() uses strict equality (===) for comparison, so "line1 " (trailing space) and "line1" would be treated as different lines. Strip lines if you want to normalize whitespace before deduplication.

Using Microsoft Excel or Google Sheets

If your data is in a spreadsheet: paste lines into column A, then use Data → Remove Duplicates (Excel) or Data → Data cleanup → Remove duplicates (Google Sheets). This removes rows with identical cell values and is the most accessible approach for non-technical users.

Case sensitivity considerations

Most tools treat Hello and hello as different lines. If you want case-insensitive deduplication — keeping only one of ERROR and error — you need to normalize before comparing. In Python: compare line.strip().lower() but keep the original case in the output. In awk: awk '!seen[tolower($0)]++'.