I’m trying to create a program which will search the xml files for nodes in the form <disp-formula id="deqnX-Y">
, create a dictionary where key’s are like rid=”deqnX” … rid=”deqnY”, (where X is incremented by +1 till it reaches Y) and their respective value counterparts are like rid=”deqnX-Y” each, Then I can simply do a search and replace using the dictionary to change the link nodes. i.e. if the file has nodes like <disp-formula id="deqn5-7">, <disp-formula id="deqn9-10">, <disp-formula id="deqn3a-3c">, <disp-formula id="deqn4p-5b">
and there are link nodes in the form
<xref ref-type="disp-formula" rid="deqn5"> <xref ref-type="disp-formula" rid="deqn6"> <xref ref-type="disp-formula" rid="deqn10"> <xref ref-type="disp-formula" rid="deqn5c">
they should be changed to
<xref ref-type="disp-formula" rid="deqn5-7"> <xref ref-type="disp-formula" rid="deqn5-7"> <xref ref-type="disp-formula" rid="deqn9-10"> <xref ref-type="disp-formula" rid="deqn4p-5b">
I also want the program to ignore nodes like <disp-formula id="deqn5-7c">
and/or <disp-formula id="deqn2a-4">
in the file.
I’m using the below code for now
void Button1Click(object sender, EventArgs e) { string active_filename = ""; try { string[] path = Directory.GetDirectories(textBox1.Text, "xml", SearchOption.AllDirectories) .SelectMany(x => Directory.GetFiles(x, "*.xml", SearchOption.AllDirectories)).ToArray(); Dictionary<string, string> dict = new Dictionary<string, string>(); var pat = new Regex(@"^deqn(\d+([a-z]+)?)-(\d+(?(2)[a-z]+))$ "); foreach (var file in path) { File.Copy(file,file+".bk",true); dict.Clear(); active_filename = file; XDocument doc = XDocument.Load(file, LoadOptions.PreserveWhitespace); IEnumerable<XAttribute> list_of_elements = doc.Descendants("disp-formula").Where(z => (z.Attribute("id") != null) && pat.IsMatch(z.Attribute("id").Value)).Attributes("id"); foreach (XAttribute ele in list_of_elements) { var m = Regex.Match((string) ele, pat.ToString()); if (m.Success) { var X = m.Groups[1].Value; var Y = m.Groups[3].Value; int Xi, Yi; var isInt = int.TryParse(X, out Xi); if (isInt) { //for deqnX-Y where both X and Y are integers Yi = int.Parse(Y); for (int i = Xi; i <= Yi; i++) dict.Add("rid=\"deqn" + i + "\"", "rid=\"" + ele.Value + "\""); } else { //for deqnX-Y where both X and Y are a combination of integer-alphabet char startCharacter = X.Substring(X.Length - 1)[0]; char endCharacter = Y.Substring(Y.Length - 1)[0]; int startNumber = int.Parse(X.Substring(0, X.Length - 1)); int endNumber = int.Parse(Y.Substring(0, Y.Length - 1)); string alphabet = "abcdefghijklmnopqrstuvwxyz"; for (int i = startNumber; i <= endNumber; ++i) { int currentCharEnd = (i == endNumber) ? alphabet.IndexOf(endCharacter) : alphabet.Length - 1; for (int j = alphabet.IndexOf(startCharacter); j <= currentCharEnd; ++j) { dict.Add("rid=\"deqn" + i.ToString() + alphabet[j] + "\"", "rid=\"deqn" + X.ToString() + "-" + Y.ToString() + "\""); } startCharacter = 'a'; } } } string text = File.ReadAllText(file); foreach (KeyValuePair<string, string> element in dict) { //do a search all replace all (search Key and replace by Value text = text.Replace(element.Key, element.Value); } File.WriteAllText(file, text); } } MessageBox.Show("Done"); } catch (Exception ex) { MessageBox.Show(string.Format(@"Error in file ({0}), below are the debug details: {1}", active_filename, ex.StackTrace.ToString())); }
Is it possible to make the program more efficient? If so how can I do that?
Will changing
foreach (KeyValuePair<string, string> element in dict) { text = text.Replace(element.Key, element.Value); }
by
Parallel.ForEach(dict, element => { text = text.Replace(element.Key, element.Value); } );
make any substantial difference?