Often you need to split a line by a textual pattern. This page shows you how.
I have a file that is a translation of Chinese lyrics. It is formatted like this:
你是我最苦澀的等待 | you are my hardest wait 讓我歡喜又害怕未來 | giving me joy and also fear the future
The left side is Chinese, the right side is English. I want to write a program to split the line, so that i get the whole Chinese part or the whole English part.
Here's the code:
# -*- coding: utf-8 -*- # python import re myText = ur"""你是我最苦澀的等待 | you are my hardest wait 讓我歡喜又害怕未來 | giving me joy and also fear the future""" # split into lines myLines = re.split(r'\n', myText) for aLine in myLines: lineParts = re.split(r'\s*\|\s*', aLine, re.U) print lineParts[0].encode('utf-8') # prints: # 你是我最苦澀的等待 # 讓我歡喜又害怕未來
See also: String Pattern Matching (regex) Documentation ◇ Unicode in Perl & Python.
To split a line into a list using a text pattern as the seperator, use the function “split”. Here's a basic example:
# -*- coding: utf-8 -*- # perl use strict; my $myText = '你是我最苦澀的等待 | you are my hardest wait 讓我歡喜又害怕未來 | giving me joy and also fear the future'; my @myLines= split (/\n/, $myText); # use Data::Dumper; # print @myLines; for my $aLine (@myLines) { my @lineParts = split(/\s*\|\s*/, $aLine); print "$lineParts[0]\n"; } # prints: # 你是我最苦澀的等待 # 讓我歡喜又害怕未來blog comments powered by Disqus