MathCurvesSurfacesWallpaper GroupsGallerySoftwarePOV-Ray
ProgramingLinuxPerl PythonHTMLCSSJavaScriptPHPJavaEmacsUnicode ♥
Web Hosting by 1&1

Python & Perl: Splitting a Line by Regex

Xah Lee, ,

Often you need to split a line by a textual pattern. This page shows you how.

Python

I have a file that is a translation of Chinese lyrics. It is formatted like this:

你是我最苦澀的等待   |   you are my hardest wait
讓我歡喜又害怕未來   |   giving me joy and also fear the future

The left side is Chinese, the right side is English. I want to write a program to split the line, so that i get the whole Chinese part or the whole English part.

Here's the code:

# -*- coding: utf-8 -*-
# python

import re

myText = ur"""你是我最苦澀的等待   |   you are my hardest wait
讓我歡喜又害怕未來   |   giving me joy and also fear the future"""
 
# split into lines
myLines = re.split(r'\n', myText)

for aLine in myLines:
    lineParts = re.split(r'\s*\|\s*', aLine, re.U)
    print lineParts[0].encode('utf-8')

# prints:
# 你是我最苦澀的等待
# 讓我歡喜又害怕未來

See also: String Pattern Matching (regex) DocumentationUnicode in Perl & Python.

Perl

To split a line into a list using a text pattern as the seperator, use the function “split”. Here's a basic example:

# -*- coding: utf-8 -*-
# perl

use strict;

my $myText = '你是我最苦澀的等待   |   you are my hardest wait
讓我歡喜又害怕未來   |   giving me joy and also fear the future';

my @myLines= split (/\n/, $myText);

# use Data::Dumper;
# print @myLines;

for my $aLine (@myLines) {
    my @lineParts = split(/\s*\|\s*/, $aLine);
    print "$lineParts[0]\n";
  }

# prints:
# 你是我最苦澀的等待
# 讓我歡喜又害怕未來
blog comments powered by Disqus